Parse BLAST output (tabular format)

We performed a BLAST aligment using the -m 8 parameters, that produces a tabular format.
Let’s review how to parse it with Perl.


This is the list of fields of each line:

0:  query name
1:  subject name
2:  percent identities
3:  aligned length
4:  number of mismatched positions
5:  number of gap positions
6:  query sequence start
7:  query sequence end
8:  subject sequence start
9:  subject sequence end
10: e-value
11: bit score

We want to write a script that prints each line (=aligment) only if the alignment length is greater than a user supplied cutoff.
The script thus has to take two parameters: the filename of BLAST output and the minimum alignment length.
Then opens the output and line by line splits the fields into and array (using the \t tab character as field separator). If the forth field (thus number 3 in an array) is greater than $cutoff we will print the line ($_).

#!/usr/bin/perl
 
# This scripts takes two parameters
($blastfile, $cutoff) = @ARGV;
 
if ($#ARGV<2) {
	die "Please, type the two parameters: BLAST-output and Minimum-Align-Length\n";
}
 
open(BLAST, "$blastfile") || die " Unable to load $blastfile.\n";
 
while (<BLAST>) {
	$totallines++;
	# using the split function we populate the @fields array   
	@fields = split(/\t/, $_);
 
	# Compare alignment length with user supplied cutoff
	if ($fields[3] > $cutoff)  {
		$printedlines++;
		# we print the whole line ($_) in case
		print $_;
	}
}
$percentage = int($printedlines*100/$totallines));
print STDERR "Printed $printedlines/$totallines ($percentage%).\n";
Tagged

Leave a Reply

Your email address will not be published. Required fields are marked *