4. Archiving and Data Compression

4.1. tar: Tape ARchiver

Just like find, tar is a long standing UNIX® utility, so its syntax is a bit special. The syntax is:

tar [options] [files...]

Here is a list of some options. Note that all of them have an equivalent long option, but you will have to refer to tar(1), as we will not list them here.

[Note]Note

The initial dash (-) of short options is now deprecated with tar, except after a long option.

  • c: used in order to create new archives.

  • x: used to extract files from an existing archive.

  • t: lists files from an existing archive.

  • v: enhances verbosity. Lists files which are added to an archive or extracted from an archive. If in conjunction with the t option (see above), it outputs a long listing of files instead of a short one.

  • f <file_name>: creates an archive with name file_name, extracts from archive file_name or lists files from archive file_name. If this parameter is omitted, the default file will be /dev/rmt0, which is generally the special file associated with a streamer. If the file parameter is - (a dash), the input or output (depending on whether you create an archive or extract from one) will be associated to the standard input or standard output.

  • z: tells tar that the archive to create should be compressed with gzip, or that the archive to extract from is compressed with gzip.

  • j: same as z, but the program used to compress is bzip2.

  • p: when extracting files from an archive, preserve all file attributes, including ownership, last access time and so on. Very useful for file system dumps.

  • r: appends the list of files given on the command line to an existing archive. Note that the archive to which you want to append files must not be compressed!

  • A: appends archives given on the command line to the one submitted with the f option. Similar to r, the archives must not be compressed in order for this to work.

There are many, many, many other options, so please refer to tar(1) for the entire list. See, for example, the d option.

Let's proceed with an example. Say you want to create an archive of all images in the /shared/images directory, compressed with bzip2, named images.tar.bz2 and located in your /home directory. You would then type:

 #
 # Note: you must be in the directory from which
 #   you want to archive files!
 #
$ cd /shared
$ tar cjf ~/images.tar.bz2 images/

As you can see, we used three options here: c told tar we wanted to create an archive, j to compress it with bzip2, and f ~/images.tar.bz2 that the archive was to be created in our home directory, and we want its name to be images.tar.bz2. We may want to check if the archive is valid now. We can do this by listing its files:

 #
 # Get back to our home directory
 #
$ cd
$ tar tjvf images.tar.bz2

Here we told tar to list (t) files from the images.tar.bz2 archive (f images.tar.bz2), warned that this archive was compressed with bzip2 (j), and that we wanted a long listing (v). Now, say you have erased the images directory. Fortunately your archive is intact and you now want to extract it back to its original place, in /shared. But as you don't want to break your find command for new images, you need to preserve all file attributes:

 #
 # cd to the directory where you want to extract
 #
$ cd /shared
$ tar jxpf ~/images.tar.bz2

And here you are!

Now, let's say you want to extract the images/cars directory from the archive, and nothing else. Then you can type this:

$ tar jxf ~/images.tar.bz2 images/cars

If you try to back up special files, tar will take them as what they are, special files, and will not dump their contents. So yes, you can safely put /dev/mem in an archive. It also deals correctly with links, so do not worry about this either. For symbolic links, also look at the h option in the manpage.

4.2. bzip2 and gzip: Data Compression Programs

We have already discussed these utilities when dealing with tar. Unlike WinZip® on Windows®, archiving and compressing are done using two separate utilities: tar for archiving, and the two programs which we will now introduce for compressing data: bzip2 and gzip. You might also use a different compression tool, programs such as zip, arj or rar also exist for GNU/Linux (but they are rarely used).

At first, bzip2 was written as a replacement for gzip. Its compression ratios are generally better, but on the other hand, it requires more RAM while working. However gzip is still used for compatibility with older systems.

Both commands have a similar syntax:

gzip [options] [file(s)]

If no file name is given, both gzip and bzip2 will wait for data from the standard input and send the result to the standard output. Therefore, you can use both programs in pipes. Both programs also have a set of common options:

  • -1, ..., -9: set the compression ratio. The higher the number, the better the compression, but better also means slower.

  • -d: decompress file(s). This is equivalent to using gunzip or bunzip2.

  • -c: dump the result of compression/decompression of files given as parameters to the standard output.

[Warning]Warning

By default both gzip and bzip2 erase the file(s) that they have compressed (or uncompressed) if you don't use the -c option. You can avoid doing this in bzip2 by using the -k option. gzip has no equivalent option.

Now some examples. Let's say you want to compress all files ending with .txt in the current directory using bzip2 with maximum compression. You would type:

$ bzip2 -9 *.txt

Now you want to share your image archive with someone, but he does not have bzip2, only gzip. You don't need to decompress the archive and re-compress it, you can just decompress to the standard output, use a pipe, compress from standard input and redirect the output to the new archive. Like this:

bzip2 -dc images.tar.bz2 | gzip -9 >images.tar.gz

You could have typed bzcat instead of bzip2 -dc. There is an equivalent for gzip but its name is zcat, not gzcat. You also have bzless for bzip2 files and zless for gzip if you want to view compressed files directly instead of having to decompress them first. As an exercise, try and find the command you would have to type in order to view compressed files without decompressing them, and without using bzless or zless.