File Operations and Filtering

Most command-line work is done on files. In this section we discuss how to watch and filter file content, take required information from files using a single command, and to sort a file's content.

cat, tail, head, tee: File Printing Commands

These commands have almost the same syntax: command_name [option(s)] [file(s)], and may be used in a pipe. All of them are used to print part of a file according to certain criteria.

The cat utility concatenates files printing the results to standard output. This is one of the most widely used commands. You can use:

# cat /var/log/mail/info

to print, for example, the content of a mailer daemon log file to standard output[14]. The cat command has a very useful option (-n) which allows you to print the line numbers.

Some files, like daemon log files (if they are running) are usually huge in size[15] and printing them completely on the screen is not very useful. Often you need to see only some lines of the file. You can use the tail command to do so. The following command will print, by default, the last 10 lines of the file /var/log/mail/info:

# tail /var/log/mail/info

You can use the -n option to display the last Nth lines of a file. For example, to display the last 2 lines, you would issue:

# tail -n2 /var/log/mail/info

The head command is similar to tail, but it prints the first lines of a file. The following command will print, by default, the first 10 lines of the /var/log/mail/info file:

# head /var/log/mail/info

As with tail you can use the -n option to specify the number of lines to be printed. For example, to print the first 2, you issue:

# head -n2 /var/log/mail/info

You can also use these commands together. For example, if you wish to display only lines 9 and 10, you can use a command where the head command will select the first 10 lines from a file and pass them through a pipe to the tail command.

# head /var/log/mail/info | tail -n2

The last part will then select the last 2 lines and will print them to the screen. In the same way you can select line number 20, counted from the end of a file:

# tail -n20 /var/log/mail/info |head -n1

In this example we tell tail to select the file's last 20 lines and pass them through a pipe to head. Then the head command prints to the screen the first line from the obtained data.

Let's suppose we want to print the result of the last example to the screen and save it to the file results.txt at the same time. The tee utility can help us. Its syntax is:

tee [option(s)] [file]

Now we can change the previous command this way:

# tail -n20 /var/log/mail/info |head -n1|tee results.txt

Let's take yet another example. We want to select the last 20 lines, save them to results.txt, but print on screen only the first of the 20 selected lines. Then we should type:

# tail -n20 /var/log/mail/info |tee results.txt |head -n1

The tee command has a useful option (-a) which allows you to append data to an existing file.

Let's go back to the tail command. Files such as logs usually vary dynamically because the daemon associated to that log constantly adds actions and events to the log file. So, if you want to interactively watch the changes to the log file you can take advantage of the -f option:

# tail -f /var/log/mail/info

In this case all changes in the /var/log/mail/info file will be printed on screen immediately. Using the tail command with option -f is very helpful when you want to know how your system works. For example, looking through the /var/log/messages log file, you can keep up with system messages and various daemons.

In the next section we will see how we can use grep as a filter to separate Postfix messages from messages coming from other services.

grep: Locate Strings in Files

Neither the name nor the acronym (“General Regular Expression Parser”) is very intuitive, but what it does and its use are simple: grep looks for a pattern given as an argument in one or more files. Its syntax is:

grep [options] <pattern> [one or more file(s)]

If several files are mentioned, their names will precede each matching line displayed in the result. Use the -h option to prevent the display of these names; use the -l option to get nothing but the matching filenames. The pattern is a regular expression, even though most of the time it consists of a simple word. The most frequently used options are the following:

  • -i: make a case insensitive search (i.e. ignore differences between lower and uppercase);

  • -v: invert search. display lines which do not match the pattern;

  • -n: display the line number for each line found;

  • -w: tells grep that the pattern should match a whole word.

So let's go back to analyze the mailer daemon's log file. We want to find all lines in the file /var/log/mail/info which contain the “postfix” pattern. Then we type this command:

# grep postfix /var/log/mail/info

The grep command can be used in a pipe. Thus we can get the same result as in the previous example by doing this:

# cat /var/log/mail/info | grep postfix 

If we want to find all lines not containing the “postfix” pattern, we would use the -v option:

# grep -v postfix /var/log/mail/info

Let's suppose we want to find all messages about successfully sent mails. In this case we have to filter all lines which were added into the log file by the mailer daemon (contains the “postfix” pattern) and they must contain a message about successful sending (“status=sent”):

# grep postfix /var/log/mail/info |grep status=sent

In this case grep is used twice. It is allowed, but not very elegant. We can get the same result by using the fgrep utility. First, we need to create a file containing patterns written out in a column. Such a file can be created this way (we use patterns.txt as the file name):

# echo -e 'status=sent\npostfix' >./patterns.txt

Then we call the next command where we use the patterns.txt file with a list of patterns and the fgrep utility instead of the “double calling” of grep:

# fgrep -f ./patterns.txt /var/log/mail/info

The file ./patterns.txt may contain as many patterns as you wish. Each of them has to be typed as a single line. For example, to select messages about successfully sent mails to peter@mandrakesoft.com, it would be enough to add this email into our ./patterns.txt file by running this command:

# echo 'peter@mandrakesoft.com' >>./patterns.txt

It is clear that you can combine grep with tail and head. If we want to find messages about the last but one email sent to peter@mandrakesoft.com we type:

# fgrep -f ./patterns.txt /var/log/mail/info | tail -n2 | head -n1

Here we apply the filter described above and place the result in a pipe for the tail and head commands. They select the last but one value from the data.

wc: Count Elements in Files

The wc command (Word Count) is used to count the number of strings and words in files. It is also helpful to count bytes, characters and the length of the longest line. Its syntax is:

wc [option(s)] [file(s)]

The following options are useful:

  • -l: print the number of new lines;

  • -w: print the number of words;

  • -m: print the total number of characters;

  • -c: print the number of bytes;

  • -L: print the length of the longest line in the obtained text.

The wc command prints the number of newlines, words and characters by default. Here some usage examples:

If we want to find the number of users in our system, we can type:

$wc -l /etc/passwd 

If we want to know the number of CPU's in our system, we write:

$grep "model name" /proc/cpuinfo |wc -l

In the previous section we obtained a list of messages about successfully sent mails to e-mail addresses listed in our ./patterns.txt file. If we want to know the number of such messages, we can redirect our filter's results in a pipe for the wc command:

# fgrep -f ./patterns.txt /var/log/mail/info | wc -l

sort: Sorting File Content

Here is the syntax of this powerful sorting utility[16]:

sort [option(s)] [file(s)]

Let's consider sorting on part of the /etc/passwd file. As you can see this file is not sorted:

$ cat /etc/passwd

If we want to sort it by login field. Then we type:

$ sort /etc/passwd

The sort command sorts data in ascending order starting by the first field (in our case, the login field) by default. If we want to sort data in descending order, we use the option -r:

$ sort -r /etc/passwd

Every user has his own UID written in the /etc/passwd file. Let's sort a file in ascending order using the UID field:

$ sort /etc/passwd -t":" -k3 -n

Here we use the following sort options:

  • -t":": tells sort that the field separator is the ":" symbol;

  • -k3: means that sorting must be done on the third column;

  • -n: says that the sort is to occur on numerical data, not alphabetical.

The same can be done in reverse:

$ sort /etc/passwd -t":" -k3 -n -r

Note that sort has two other important options:

  • -u: perform a strict ordering: duplicate sort fields are discarded;

  • -f: ignore case (treat lowercase characters the same way as uppercase ones).

Finally, if we want to find the user with the highest UID we can use this command:

$ sort /etc/passwd -t":" -k3 -n |tail -n1

where we sort the /etc/passwd file in ascending order according to the UID column, and redirect the result through a pipe to the tail command which will print out the first value of the sorted list.



[14] Some examples in this section are based on real work and server log files (services, daemons). Make sure syslogd (allows daemon's logging), and the corresponding daemon (in our case Postfix) are running, and that you work as root. Of course, you can always apply our examples to other files.

[15] For example, the /var/log/mail/info file contains information about all sent mails, messages about fetching mail by users with the POP protocol, etc.

[16] We only discuss sort briefly here because whole books can be written about its features.