Homework: from genes to proteins (hw5)

Homework: (hw5)

This time we are going to put together things we already know to do something a bit more complicated. We are going to write a program that translates a DNA sequence to RNA and then to the protein it encodes. This is a real example of what you can do, not bad uh?

This is also the hardest homework of this gap, and to be honest after this one you will have just one final “review” homework.

The code for this homework is hw5.

In the beginning there was a sequence…

This is the code you will start from:

@bases = ('a','c', 'g', 't', 'A', 'C', 'G', 'T');
for ($i = 0; $i < 99; $i++) {
  push @seq, $bases[int(rand(4))];
}

It produces an array @seq, containing ninety-nine bases as one character strings. Note that some bases are upper case and others are lower case.

Exercise 1: from DNA to RNA

You have to add some code to the staring program.
Write the code to create new array @rna_seq, starting from the  @seq array but having all bases upper case and replacing thymine with uracil (it is called @rna_seq for a reason).
Print both sequences on two lines so you can easily check that the “translation” works correctly.

Exercise 2: from RNA to codons

Again, add some code to the previous program to create yet another array, called @codons. It should contain thirty-three strings of three characters, the codons in @rna_seq. Basically you have to put together three consecutive one character strings from @rna_seq and do it for each group of three bases.

You may (but it’s not necessary) need the concatenation operator.” to create the strings, that works like:

 $stringa = "I like..." . " homeworks!\n" . "Especially during..." . " the holidays!;

Print the codons, one per line, and check.

Exercise 3: from codons to proteins

Add the code to create the last array, called @aa (guess why?).  It should contain thirthy-three one character strings, the amino acids corresponding to the codons in @codons.

How can you do it? You could write a bunch of ifs but it would be very messy and long and there has to be a better way. The better way is with a hash. Here is a hash with the genetic code:

my %genetic_code = (
'UCA' => 'S', # Serine
'UCC' => 'S', # Serine
'UCG' => 'S', # Serine
'UCU' => 'S', # Serine
'UUC' => 'F', # Phenylalanine
'UUU' => 'F', # Phenylalanine
'UUA' => 'L', # Leucine
'UUG' => 'L', # Leucine
'UAC' => 'Y', # Tyrosine
'UAU' => 'Y', # Tyrosine
'UAA' => '_', # Stop
'UAG' => '_', # Stop
'UGC' => 'C', # Cysteine
'UGU' => 'C', # Cysteine
'UGA' => '_', # Stop
'UGG' => 'W', # Tryptophan
'CUA' => 'L', # Leucine
'CUC' => 'L', # Leucine
'CUG' => 'L', # Leucine
'CUU' => 'L', # Leucine
'CCA' => 'P', # Proline
'CAU' => 'H', # Histidine
'CAA' => 'Q', # Glutamine
'CAG' => 'Q', # Glutamine
'CGA' => 'R', # Arginine
'CGC' => 'R', # Arginine
'CGG' => 'R', # Arginine
'CGU' => 'R', # Arginine
'AUA' => 'I', # Isoleucine
'AUC' => 'I', # Isoleucine
'AUU' => 'I', # Isoleucine
'AUG' => 'M', # Methionine
'ACA' => 'T', # Threonine
'ACC' => 'T', # Threonine
'ACG' => 'T', # Threonine
'ACU' => 'T', # Threonine
'AAC' => 'N', # Asparagine
'AAU' => 'N', # Asparagine
'AAA' => 'K', # Lysine
'AAG' => 'K', # Lysine
'AGC' => 'S', # Serine
'AGU' => 'S', # Serine
'AGA' =>'R', # Arginine
'AGG' => 'R', # Arginine
'CCC' => 'P', # Proline
'CCG' => 'P', # Proline
'CCU' => 'P', # Proline
'CAC' => 'H', # Histidine
'GUA' => 'V', # Valine
'GUC' => 'V', # Valine
'GUG' => 'V', # Valine
'GUU' => 'V', # Valine
'GCA' => 'A', # Alanine
'GCC' => 'A', # Alanine
'GCG' => 'A', # Alanine
'GCU' => 'A', # Alanine
'GAC' => 'D', # Aspartic Acid
'GAU' => 'D', # Aspartic Acid
'GAA' => 'E', # Glutamic Acid
'GAG' => 'E', # Glutamic Acid
'GGA' => 'G', # Glycine
'GGC' => 'G', # Glycine
'GGG' => 'G', # Glycine
'GGU' =>; 'G'  # Glycine
);

As you can see, the keys are all the possible 64 codons and the value is the corresponding amino acid (as a one character string).

Now for each codon you can just use the hash table and get the amino acid. So do it!

Exercise 4: fix exercise 2

Usually there are many different ways of doing the same thing in a program and this is especially true in Perl.
In exercise 2 you were tasked with translating the DNA sequence to RNA.

The best way to do this is with a hash: the code is shorter, clearer and faster. The idea is the same as in exercise 3. Try to rewrite the code from exercise 2 using a hash to do the translation from lower to upper case and from thymine to uracil.

How to submit your work

Once you checked that your script is okay, you must submit the work done filling this form. Remember to put your login ID, and the homework ID: hw5.

Tagged

6 thoughts on “Homework: from genes to proteins (hw5)

  1. Anna says:

    Ciao!Sto provando a fare il primo esercizio, sono riuscita convertire ogni lettera in maiuscolo, non riesco però a fare in modo che ad ogni T venga sostituita una U in @rna_seq. Ho provato utilizzando if (o anche foreach) ma nulla.. sono completamente fuoristrada?

    • Andrea Telatin says:

      Ciao, se non hai una domanda specifica conviene che posti anche il tuo programma cosi capiamo meglio il busillis.
      Ci sono sempre più vie ma sicuramente anche “if” va bene, quindi da qualche parte potrebbe esserci un problemino!

      • Anna says:

        Poi sono riuscita a risolvere il problema, ma mi associo alla richiesta di Gloria per quanto riguarda l’uso degli hash!

  2. Gloria says:

    Ciao.. scusate io ho un problema (che a quanto ho capito abbiamo in molti): gli hash. Ho anche cercato su internet ma non trovo soluzioni. La mia logica in questo esercizio è: confronto ogni elemento dell’array @codons con le keys dell’hash %genetic_code, e se coincidono allora aggiungo a un nuovo array @aa il valore corrispondente. Però non riesco a scriverlo.
    Non è che potreste spiegarci la prossima volta come funzionano questi hash? Perché ci sto veramente perdendo la vita dietro e son sempre al punto di partenza..

Leave a Reply

Your email address will not be published. Required fields are marked *