Most command-line work is done on files. In this section we will show you how to watch and filter file content, how to take required information from files using a single command, and how to easily sort a file's content.
These commands have almost the same syntax: command_name [option(s)] [file(s)], and may be used in a pipe. All of them are used to print part of a file according to certain criteria.
The cat utility concatenates files and prints the results to the standard output, which is usually the screen of your computer. This is one of the most widely used commands. For example you can use:
# cat /var/log/mail/info
to print the content of a
mailer daemon log file to the standard output[14]. The cat
command has a very useful option (-n
) which
allows you to print the line numbers.
Some files such as
daemon log files (if they are running) are usually huge in
size[15] and printing them completely on the screen is not
very useful. Generally speaking you only need to see some lines
of the file. You can use the tail command to do so. The
following command will print, by default, the last 10 lines of
the /var/log/mail/info
file:
# tail /var/log/mail/info
Files such as logs
usually vary dynamically because the daemon associated to that
log constantly adds actions and events to the log file. To
interactively watch these changes you can take advantage of the
-f
option:
# tail -f /var/log/mail/info
In this case all
changes in the /var/log/mail/info
file will
be printed on screen immediately. Using the tail command
with option -f
is very helpful when you want to
know how your system works. For example, looking through the
/var/log/messages
log file, you can keep up
with system messages and various daemons.
If you use tail with
more than one file it will print the name of the file on a line
by itself before printing its contents. It also works with the
-f
option and is a valuable addition to see how
different parts of the system interact.
You can use the
-n
option to display the last
n lines of a file. For
example, to display the last 2 lines, you would issue:
# tail -n2 /var/log/mail/info
Just as for other
commands, you can use different options at the same time. For
example, using both -n2
and -f
at the same time, you start with the two last lines of the file
and keep on seeing new lines as they are written to the log
file.
The head command is
similar to tail, but it prints the first lines of a file. The
following command will print, by default, the first 10 lines of the
/var/log/mail/info
file:
# head /var/log/mail/info
As with
tail you can use the -n
option to
specify the number of lines to be printed. For example, to print
the first two, issue:
# head -n2 /var/log/mail/info
You can also use these commands together. For example, if you wish to display only lines 9 and 10, you can use a command where the head command will select the first 10 lines from a file and pass them through a pipe to the tail command.
# head /var/log/mail/info | tail -n2
The last part will then select the last 2 lines and will print them to the screen. In the same way you can select line number 20, starting from the end of the file:
# tail -n20 /var/log/mail/info |head -n1
In this example we tell tail to select the file's last 20 lines and pass them through a pipe to head. Then the head command prints to the screen the first line of the data obtained.
Lets suppose we want
to print the result of the last example to the screen and save
it to the results.txt
file. The tee
utility can help us. Its syntax is:
tee [option(s)] [file]
Now we can change the previous command this way:
# tail -n20 /var/log/mail/info |head -n1|tee results.txt
Lets take yet another
example. We want to select the last 20 lines, save them to
results.txt
, but print on screen only the first of
the 20 selected lines. Then we should type:
# tail -n20 /var/log/mail/info |tee results.txt |head -n1
The tee command
has a useful option (-a
) which enables you to
append data to an existing file.
In the next section we will see how we can use the grep command as a filter to separate Postfix messages from other messages coming from different services.
Neither the name nor the acronym (“General Regular Expression Parser”) is very intuitive, but what it does and its use are simple: grep looks for a pattern given as an argument in one or more files. Its syntax is
grep [options] <pattern> [one or more file(s)]
. If several files are mentioned,
their names will precede each matching line displayed in the
result. You can use the -h
option to prevent
displaying these names or you can use the -l
option to get nothing but the matching file names. The pattern
is a regular expression, even though most of the time it
consists of a simple word. The most frequently used options
are the following:
So lets go back to
analyze the mailer daemon's log file. We want to find all lines
in the /var/log/mail/info
file which
contain the postfix
pattern. Then we type this
command:
# grep postfix /var/log/mail/info
The grep command can be used in a pipe. Thus we can get the same result as in the previous example by doing this:
# cat /var/log/mail/info | grep postfix
But please note that it is unnecessary to use cat here. On the other hand, it is very interesting to use grep and tail as a combination to find useful information on a running system.
If we want to find all
lines which do NOT contain the postfix
pattern, we
would use the -v
option:
# grep -v postfix /var/log/mail/info
Let's suppose we want
to find all messages about successfully sent mails. In this case
we have to filter all lines which were added to the log file by
the mailer daemon (contains the postfix
pattern) and they must contain a message about successful
sending (status=sent
)[16]:
# grep postfix /var/log/mail/info |grep status=sent
In this case
grep is used twice. It is allowed, but not very elegant.
The same result can be achieved by using the fgrep
utility. fgrep is actually a simpler method to call
grep -F. First we need to create a file
containing patterns written out one to a line. Such a file can be
created this way (we use patterns.txt
as
the file name):
# echo -e 'status=sent\npostfix' >./patterns.txt
Check the results
with the cat command. \n
is a special
pattern which means “new line”.
Then we call the
next command where we use the patterns.txt
file and the fgrep utility instead of “double
calling” of grep:
# fgrep -f ./patterns.txt /var/log/mail/info
The file
./patterns.txt
may contain as many patterns
as you wish. For example, to select messages about successfully
sent mails to peter@mandrakesoft.com
, it
would be enough to add this e-mail address into our
./patterns.txt
file by running this
command:
# echo 'peter@mandrakesoft.com' >>./patterns.txt
It is clear that you
can combine grep with tail and head. If we
want to find messages about the second-to-last e-mail sent to peter@mandrakesoft.com
we would type:
# fgrep -f ./patterns.txt /var/log/mail/info | tail -n2 | head -n1
Here we apply the filter described above and place the result in a pipe for the tail and head commands. They select the last but one value from the data.
With grep we are stuck with patterns and fixed data. How would we find all e-mails sent to each and every employee of “ABC Company”? Listing all their e-mails would not be an easy task since we might end up missing someone or having to dig in the log file by hand.
As with fgrep, grep has a shortcut to the command grep -E: egrep. It takes regular expressions instead of patterns, providing us with a more powerful interface to “grep” text.
Besides what we mentioned in Section 3, “Shell Globbing Patterns” while talking about globbing patterns, here are some additional regular expressions:
[:alnum:]
,
[:alpha:]
and
[:digit:]
can be used instead of defining
the classes of characters yourself and represent,
respectively, all letters plus all digits, all letters, and
all digits (uppercase and lowercase). They have an
additional bonus: they include internationalized characters
and respect the localization of the system.
[:print:]
represents all characters which can be printed on
screen.
[:lower:]
and
[:upper:]
represent all lowercase and
uppercase letters.
There are more classes available and you can see all of them in egrep(1). The above are the most commonly used ones.
A regular expression may be followed by one of several repetition operators:
If you put a regular expression
inside parenthesis you can recover it later. Lets say that you
specified the [:alpha:]+
expression. It might
represent a word. If you want to detect words which occur twice
you can put this inside parenthesis and reuse it with
\1
if it is the first group. You can have up
to 9 of these “memories”.
$ echo -e "abc def\nabc abc def\nabc1 abc1\nabcdef\nabcdabcd\nabcdef abcef" > testfile $ egrep "([[:alpha:]]+) \1" testfile abc abc def $
The only line returned is the one which exclusively matched two groups of letters separated by a space. No other group matched the regular expression.
You can also use the
|
character to match the expression to the
left of the |
or one to the right of it. It
is an operator which joins those expressions. Using the same
testfile
created above, you can try looking
for expressions which contain only double words or contains
double words with numbers:
$ egrep "([[:alpha:]]+) \1|([[:alpha:][:digit:]]+) \2" testfile abc abc def abc1 abc1 $
Note that for the second group
using parenthesis we had to use \2
, otherwise
it would not match with what we wanted. A more efficient
expression would be, in this particular case:
$ egrep "([[:alnum:]]+) \1" testfile abc abc def abc1 abc1 $
Finally to match certain
characters you have to “escape” them, preceding
them with a backslash. Those characters are:
?
, +
,
{
, |
,
(
, and )
. To match those
you have to write: \?
, \+
,
\{
, \|
,
\(
, and \)
.
This simple trick might help to prevent you typing repeated words in “your your” text.
Regular expressions on all tools should follow these, or very similar, rules. Taking some time to understand theses rules will help a lot with other tools such as sed. sed allows you to manipulate text, changing it using regular expressions as rules amongst other things.
The wc command (Word Count) is used to count the number of lines, strings and words in files. It is also helpful to count bytes, characters and the length of the longest line. Its syntax is:
wc [option(s)] [file(s)]
The following options are useful:
The wc command prints the number of lines, words and characters by default. Here are some usage examples:
If we want to find the number of users in our system, we can type:
$ wc -l /etc/passwd
If we want to know the number of CPU's in our system, we write:
$ grep "model name" /proc/cpuinfo |wc -l
In the previous
section we obtained a list of messages about successfully sent
mails to e-mail addresses listed in our
./patterns.txt
file. If we want to know how
many messages it contains, we can redirect our filter's results
in a pipe to the wc command:
# fgrep -f ./patterns.txt /var/log/mail/info | wc -l
Here is the syntax of this powerful sorting utility[17]:
sort [option(s)] [file(s)]
Let's consider sorting on part of the
/etc/passwd
file. As you can see this file is not
sorted:
$ cat /etc/passwd
If we want to sort it by
login
field. Then we type:
$ sort /etc/passwd
The sort command sorts data in
ascending order starting by the first field (in our case, the
login
field) by default. If we want to sort data in
descending order, use the -r
option:
$ sort -r /etc/passwd
Every user has his or her own
UID
written in the
/etc/passwd
file. The following command
sorts a file in ascending order using the UID
field:
$ sort /etc/passwd -t":" -k3 -n
Here we use the following sort options:
The same can be done in reverse:
$ sort /etc/passwd -t":" -k3 -n -r
Note that sort has two other important options:
Finally, if we want to find
the user with the highest UID
we can use this
command:
$ sort /etc/passwd -t":" -k3 -n |tail -n1
where we sort the /etc/passwd
file in
ascending order according to the UID
column,
and redirect the result through a pipe to the tail
command. The latter will print out the first value of the sorted
list.
[14] Some examples in this section
are based on real work and server log files (services, daemons,
etc.). Make sure syslogd (which
allows the logging of a daemon activity), and the corresponding
daemon (in our case Postfix) are running, and that you
work as root
. Of course, you can always apply our
examples to other files.
[15] For example, the
/var/log/mail/info
file contains
information about all sent mails, messages about fetching mail
by users with the POP
protocol, etc.
[16] Although it is possible to filter just by the status pattern please play with us as we want to show you a new command with this example.
[17] We will only discuss sort briefly here. Whole books can be written about its features.