Second computer lab

After the brief introduction in class you are now ready to try by yourself a couple of command line programs: BLAST and Primer3. After having worked with these tools, you’ll have some more time to play with Perl. Today we will

Practice again with the shell
Perform a BLAST via command line
Perl scripts
Primer3 test

Now our servers are set up and we can monitor your activity… so remember: read carefully, and type filenames exactly as we suggest.

1) Let’s start from the shell

Open a terminal and check that you are in your home direrctory. You can “cd ~” to go quickly to the home, and then use the “pwd” command to see the absolute path of your home. Please, use the fu**ing TAB when possible, and review how to browse the filesystem.

Note that I added a data directory in your home with some files. List them and have a look!

Now create a lab2 directory.

Create a file called “mysequences.fa” with your Text Editor. Now paste into it at least 5 sequences in FASTA format. Choose five genes from S. cereevisiae. You can use the NCBI databank, or a specific portal for yeast if you prefer so. Save it.

Return to the terminal. Enter in the lab2 direectory and list the files. You should see your mysequences.fa file. Print its content, do you remember the command?
Now print only the sequence headers. And finally count them. Isn’ this handy?

Now, the second line of your file should be a nucleotidic sequence. We can try to check this combining two commands:

head -n 2 mysequences.txt | tail -n 1

If you see only DNA then print the output of previuous command redireecting it to oligo.txt (since it has no header, we prefer not to use the .fa extension that is for FASTA files).

Now return to your home (cd ~). Type the following command:

echo YourName YourSurname > 2.lab

This command just prints your name and surname into a filename called “2.lab” and will be used by our system.

2) let’ try using BLAST from the shell

As mentioned in class the command line BLAST package requires you to prepare the database before performing searches. If you need them, you can download BLAST programs from here. Consider that it works under Windows too.

In our lab BLAST is installed in /home/geno/tools/blast/bin, thus to format a database:

/home/geno/tools/blast/bin/formatdb -p F -i filename

of course we have to be in the directory containing “filename” to have this program working, right? So execute this command replacing “filename” with the mysequences.fa file. List the content of the directory: you should see that BLAST created new files.

Now we can do our very first blastn:

/home/geno/tools/blast/bin/blastall -p blastn -i queryfile -d database

some extra options could specify a minimum e-value (-e 0.035) or to switch to a tabular output (-m 8). This is interesting for Perl parsing.

Now BLAST your oligo.txt against your database and save the output in the blastoutput.txt file in the same directory.

Loading ...

3) Some more Perl

3.0 – avg.pl

Create a script that receives a list of numbers as input and prints the average. Remember that Perl store all the input parameters into the @ARGV array.

3.1 – minmax.pl

Create a script that receives a list of numbers as input and print the highest and the lowest. You’ll store these numbers in $max and $min and at the end:

print "max: $max\nmin: $min\n";

Take your time to make this script. You can think about it at home…

3.2 – array functions

To add a new element into an array the command is push:

push(@array, $element);

if you want of course you can specify the element: push(@array, ‘add-this-too!’);

3.3 – revcompl.pl

This script should receive a sequence from the user and should print the reverse complementary. Store the input in a variable called $sequence. To reverse it use the reverse() function. You can accomplish both task in a single step:

$sequence = reverse($ARGV[0]);

Now we have to translate each A with a T etc… There is a function to do this:

$sequence =~tr/ACGT/TGCA/;

A the end print $sequence followed by a “new line” character. Now consider that a sequence is often lowercase or mixed case. We can overcome this problem in several ways, the simplest is to use the uc() function that returns a string converted to upper case (while lc($string) to lowercase).
You can combine more commands and type: $sequence = reverse(uc($ARGV[0]));

3.4 – multirevcompl.pl

This script should receive a list of sequences from the user and should print the reverse complementary of each. We would like, this time, to have the output in multiFasta format (call each sequence “sequence1”, “sequence2″…) etc.

3.5 – readblast.pl

Create this script that takes as input the filename of a BLAST output (created with the -m 8 parameter). Open the file and read it line by line, remembering that the line is temporarily stored in the $_ variable.

You’ll use the split variable to convert the line into an array that we can call @blastfields:

@blastfields = split( /\t/, $_);

the split function takes as first argument a pattern to break the line, and as second argument the string to be breaked.

Print only the subject, the percentage identity and alignment length. How?

3.5b – readblast2.pl

Alter the behaviour of the previous script: now we want that the script takes two parameters: the filename and a minimum length of the aligment.

In the while loop print subject, percentage identity, and alignment length only if the latter value is higher than the minimum length parameter.

3.6 – filerevcompl.pl

You want to read a sequence from a file instead of receiving it from command line. The input file can have more than one line. You will find a “seq1022.txt” file in the data directory in your home.

Loading ...

4) Let’s launch Primer3 via command line

Take the gap example posted yesterday. Download the sequence in your lab2 directory. You can use the “wget” command moving to the lab2 directory and typing:

wget http://perl.4ngs.com/wp-content/uploads/2contigs.fna

Now decide if you have to reverse and complement one of the two sequences or both… and save the new file as 2contigs-flipped.fna.

In your data directory located in the home directory you’ll find a pick-primer.p3 file that is a sample script for Primer3. First read its content with the cat command.

To use Primer3 you should go in the data directory and type the following command:

../tools/primer3/src/primer3_core < pick-primer.p3

to save the output:

../tools/primer3/src/primer3_core < pick-primer.p3

Create a copy of the file pick-primer.p3 and call it primer-gap1.p3 using this command:

cp pick-primer.p3 primer-gap1.p3

Edit the file with a text editor and change the input parameters. Try running Primer3 on this new instruction set and save the output as gap1-primers.p3out.

Perl & Genomics

Bioinformatics for Genomics Course – University of Padua