Lab02/1: A small recap and more on pipes and stuff

This post is part of the second laboratory session!

We are going to start with a refresh of Bash command, and possibly learning some new one.

Start by creating a directory “lab02” in your home, like you did in the previous lesson. If you forgot the command, think that you have to “make” a “directory”.

Enter into it. Again if you think that you have to “change” the current “directory”.

01 Standard output

Many of you had problems with this in the first session, so I feel like a more thorough explanation is in order. All commands and programs have three standard streams, that is, pipes they can read from or write to. One of them is the standard output: normally everything a programs write into it gets printed on the shell. It is what you see when you run the program. When you use “>” you redirect the standard output to a file of your choosing.

In the directory “lab02” try the following:

ls /

The command lists files present in the root (/) of the disk. Now to get all that stuff into a file type

ls / > dirs

Check the content of “dirs”, there should be the output of “ls /”. Use “head” or “less” to view the output you created.

Use “wc -l” to count how many lines are there. Now try the following:

ls / >> dirs

Do you note any difference? If you did not, check again, because there should be many more lines now. This is because “>>” redirects the output like “>”, but while “>” overwrites existing files, “>>” appends the output of the command after the end.

Now “dirs” should contain the output of “ls -l” twice! Check that the line count doubled.

There is another stream commands can write to, the standard error. However it is mostly used to print messages that are not part of the normal output, for instance errors, so it is not commonly redirected.

02 Standard input

Another importand stream is the standard input. Many unix commands (for instance head, tail, less, grep, wc) will work by reading a file that you give them as an argument, like this:

wc dirs

These are the number of lines, words and characters in “dirs” respectively. But what happens if you do not give wc an argument? Try it now

wc

Note how the command does nothing but the prompt does not appear again. This means that the command is still running. It is actually reading the standard input since you did not supply any file for wc to read from. Try to write something, like “ciao wc, cosa diavolo stai facendo?”. After that press enter to end the line and press Ctrl+d (you have to first press the control key and, keeping it down, press the “d” key). This special combination marks the end of the input stream, now wc should exit and print how many lines, words and characters you typed. So the standard input is actually what you type on the terminal.

As we saw last time “<” redirects the input, so that:

wc dirs

and

wc < dirs

are equivalent.

03 Pipes

So now you can understand a bit better what “|” (pipe) does:

command1 | command2

takes the standard output of command1 and feeds it to command2 through its standard input. For instance at the end of the last session you wrote something like:

head -n 20 dirs |  tail

in order to get the lines 10-20 of a file. Note that head reads its input from “dirs”, since we are not giving it anything through the standard input.
Instead tail should use its standard input where it will find the output of head through the pipe. The command

head -n 20 dirs |  tail dirs

does not do what we want because tail will read dirs and not its standard input and thus ignore the output of head. So we are just getting the 10 last lines of “dirs“.

04 Less

Remember the less command? less lets you browse the content of a text file.
We’d like to introduce a concept: when the lines are longer than the screen, “less” has word wrap enabled (andrà a capo).

Sometimes we want to avoid this and see all the lines unaltered (we can scroll horizontally with the arrow keys).

less -S filename

05 “Escaping” special characters [optional]

Some characters have a special meaning in the shell. For instance, we saw how “<“, “>” and “|” work. If you write them normally the shell will think that you want to use their special meaning. For instance the command:

grep > seq.fa

will not print the lines of the file “seq.fa” that contain “>”. Instead the shell will execute grep with no arguments and redirect the output to the file “seq.fa“. Since grep with no arguments does nothing, “seq.fa” will be overwritten with nothing, effectively trashing its contents.

What you probably wanted is this:

grep ">" seq.fa

You need either to quote special characters or to put a backslash (“\”) before them. Note that also ” ” (space) is a special character. In general it is better to avoid special characters in file names, since they are a bit awkward to use, but sometimes you will find them so you need to know how to handle them.

Copy the file “dirs” to a new “dirs with too many spaces” (rename in the shell is the same as “move”, ring any bells?) Check the result with ls. Note that it is hard to know if there are many files or just one with spaces in it.

Now try to see the content with less. You have to either quote the whole name with “”” (well, with “) or escape each space in it. If you type a few characters and the press tab, it should be much easier. Try it.

less "dirs with too many spaces"

less dirs\ with\ too\ many\ spaces

If you forget the quotes or the backslashes, less will think that each word is a different file.

Now a little exercise: count how many lines in “dirs with too many spaces” contain “a” and put the output in “words with <a>“.

Perl & Genomics

Bioinformatics for Genomics Course – University of Padua