This post explains how to read a file from Perl. This is an optional part of laboratory04, so don’t try this unless you really feel confident about all the rest.
Example: we want to write a program that counts the lines present in a text file, and prints the longest one.
Some theory
For Perl a text file is similar to an array of lines. When we want to read a text file, we will read it line by line. This is the main thing to consider.
Our program is a script (we will call it longestline.pl in the lab04 directory) that need a single argument: a file name.
Then the program has to open the file passed by the user and and to connect to the real file. Then we can read it.
It’s a two step strategy: first open it (that is connect to it), then do the actual reading.
The open command syntax
Perl has an “open” command that creates a connection between a file and our program (script). The open command does not read the file, but it makes it possible.
The syntax is:
open FILEHANDLE, MODE, FILENAME
Where “File handle” is just a label we want to use for the file. A tradition is to use UPPERCASE names, like FASTA or SAM, or most time simply I. No quotes around it. If we want to read more than a file is essential to assign a unique label to each one, that’s why we need this “file handle”.
The “mode” is a symbol telling Perl if we want to read, write (create) or append to (add lines at the bottom) a file. The operands are:
mode | operand | create | truncate |
---|---|---|---|
read | < | ||
write | > | ✓ | ✓ |
append | >> | ✓ |
So, to read a file we need the “<” symbol. Remeber of anything???
Finally we have to tell Perl which file to open, that is… its file name!
An example
Suppose you have a file called ‘lista.txt’ in your home directory. The full path to it is something like /home/geno/geno-XX/lista.txt. To read it:
open LIST, "<", '/home/geno/geno-XX/lista.txt';
The command returns a value that can be true (ok, i found the file and can read it) or false (we have a problem).
Reading the file
Once that we opened the file, we can read it, line by line. We need a loop, but… we don’t know how long will be the loop because we usually don’t know in advance how long the file is. Yes! We need a while loop.
The file handle is a magic thing, that when “called” returns the next line of a file. But when you reach the end of the file, will return “false” (End Of File). To call a file hanlde we should use the < > signs, like <FILEHANDLE>.
So continuing the above example:
while ($riga = <LIST>) { $conta_righe++; print "$conta_righe: $riga\n"; }
This is how to read a file line by line, storing the current line in a variable, in the example it’s called $riga. The condition to be tested in the while loop is the possibility to assign some value to $riga using the file handle.
An exercise
Now create a text file in your home directory with at least three lines. Then create a program called readfile.pl in the lab04 directory. Put the above lines in the script (both the open command and the while loop. Edit the open command so that will point to the correct file, please.
Now you should run it from the bash.
Do you notice something unexpected?
[…] How to read a file from a script – cool! we are going to be able to read, and thus elaborate, text files (like a FASTQ or SAM […]