Fourth lab: shell commands via Perl

Today we:

  • have a small lesson on shell commands with Perl
  • optimize last time’s script so that Primer3 execution will be faster in most cases
  • see some more example of regular expressions

Before starting… did you notice a link to a whole Perl book in the side bar?

Shell commands with Perl

We mentioned the fact that Perl can – an most times does – act as a glue language, meaning that we can put together different software making a new tool. What we are doing is exactly this: a pipeline that harmonizes input and output so that Primer3 and BLAST works to tell us some good primer pairs.

A pivotal feature of Perl to do this is the possibility to execute shell commands and to get their output into a variable. Let’s consider the simple “ls -l” command. It returns a list of lines right? To execute shell commands in Perl you just need to use the backticks or the qx() operator. As we want to capture output, we’ll assign to an array the operation. Examine the script that follows:

@shelloutput = `ls -l`;
foreach $line (@shelloutput) {
	print "$line";
print STDERR "Got $counter lines from the 'ls -l' command\n";

There are really many advantages of using Perl to execute commands. A major one is that we can use variables inside commands. Consider this example:

foreach $newdir (@ARGV) {
	qx(mkdir $newdir);

This script takes a list of names as input and then creates new directories with that names.

Preparing Primer3 input

Some of you last time launched Primer3 using their own input parameters. In your home you have a Primer3 input file called ~/data/pick-primer.p3. Have a look at it using the following command:

cat ~/data/pick-primer.p3

If we redirect previous command’s output to primer3, it will parse it and execute it:

cat ~/data/pick-primer.p3 | /home/geno/tools/primer3/src/primer3_core

Now we know how to pass instructions to Primer3. So, if we open “pick-primer.p3” in our text editor, we can copy and paste it into our program to create a custom file. Having a look to the structure of the file we see that we pass the DNA sequence in the “SEQUENCE_TEMPLATE” field, and after we specify the target region with the “SEQUENCE_TARGET” field, whose syntax is starting-nucleotide,target-region-length. Thus we must specify a proper target region.

Red lines show a useless primer pair. To design a proper one we should set the SEQUENCE_TARGET parameter with care.
But let’s face another problem now: if we merge two large contigs into a single $template string, we are forcing Primer3 to test all possible primers on a long template. This is very time consuming. Thus we should trim both contigs to a suitable length:

This is your major task: create a $template usign at most $maxcontiglen bases of each contig. This means that we hard-code a $maxcontiglen variable into our script (e.g. set it to 800) and that if a contig is longer, we’ll trim it accordingly with the substr() function.
With care: we need last n-bases of the first contig, and first n-bases of the second, right?

At the end we should also calculate a starting position considering that we hard-code $targetlength = 40.

$customprimer3 = "SEQUENCE_ID=$mytitle

Testing Primer3 custom input

Once that your $customprimer3 variable is properly filled you can test it. For example if your program prints it with

print "$customprimer3";

and you redirect the output to a file:

perl contig00001 U contig00002 C > mytest.p3

you can then test with

cat mytest.p3 | /home/geno/tools/primer3/src/primer3_core

Making it singing

If our output works we can make Primer3 running directly from Perl, grabbing the output with something like:

@primer3output = qx(echo "$customprimer3" | /home/geno/tools/primer3/src/primer3_core);

Of course you should PRINT the @primer3output to see if it worked.

Leave a Reply

Your email address will not be published. Required fields are marked *