We performed a BLAST aligment using the -m 8 parameters, that produces a tabular format.
Let’s review how to parse it with Perl.
This is the list of fields of each line:
0: query name 1: subject name 2: percent identities 3: aligned length 4: number of mismatched positions 5: number of gap positions 6: query sequence start 7: query sequence end 8: subject sequence start 9: subject sequence end 10: e-value 11: bit score
We want to write a script that prints each line (=aligment) only if the alignment length is greater than a user supplied cutoff.
The script thus has to take two parameters: the filename of BLAST output and the minimum alignment length.
Then opens the output and line by line splits the fields into and array (using the \t tab character as field separator). If the forth field (thus number 3 in an array) is greater than $cutoff we will print the line ($_).
#!/usr/bin/perl # This scripts takes two parameters ($blastfile, $cutoff) = @ARGV; if ($#ARGV<2) { die "Please, type the two parameters: BLAST-output and Minimum-Align-Length\n"; } open(BLAST, "$blastfile") || die " Unable to load $blastfile.\n"; while (<BLAST>) { $totallines++; # using the split function we populate the @fields array @fields = split(/\t/, $_); # Compare alignment length with user supplied cutoff if ($fields[3] > $cutoff) { $printedlines++; # we print the whole line ($_) in case print $_; } } $percentage = int($printedlines*100/$totallines)); print STDERR "Printed $printedlines/$totallines ($percentage%).\n"; |