find: Find Files According to Certain Criteria

find is a long-standing UNIX utility. Its role is to recursively scan one or more directories and find files which match a certain set of criteria in those directories. Even though it is very useful, the syntax is truly obscure, and using it requires a little work. The general syntax is:

find [options] [directories] [criterion] [action]

If you do not specify any directory, find will search the current directory. If you do not specify criteria, this is equivalent to “true”, thus all files will be found. The options, criteria and actions are so numerous that we will only mention a few of each here. Let's start with options:

A criteria can be one or more of several atomic tests. Some useful tests are:

There are many other tests, refer to find(1) for more details. To combine tests, you can use one of:

Finally, you can specify an action for each file found. The most frequently used are:

The best way to consolidate all of the options and parameters is with some examples. Let's say you want to find all directories in the /usr/share directory. You would type:

find /usr/share -type d

Suppose you have an HTTP server, all your HTML files are in /var/www/html, which is also your current directory. You want to find all files whose contents have not been modified for a month. Because you have pages from several writers, some files have the html extension and some have the htm extension. You want to link these files in directory /var/www/obsolete. You would type[17]:

find \( -name "*.htm" -o -name "*.html" \) -a -ctime -30 \
-exec ln {} /var/www/obsolete \;

This is a fairly complex example, and requires a little explanation. The criterion is this:

\( -name "*.htm" -o -name "*.html" \) -a -ctime -30

which does what we want: it finds all files whose names end either in .htm or .html \( -name "*.htm" -o -name "*.html" \) ”, and (-a) which have not been modified in the last 30 days, which is roughly a month (-ctime -30). Note the parentheses: they are necessary here, because -a has a higher precedence. If there weren't any, all files ending with .htm would have been found, plus all files ending with .html and which haven't been modified for a month, which is not what we want. Also note that parentheses are escaped from the shell: if we had put ( .. ) instead of \( .. \), the shell would have interpreted them and tried to execute -name "*.htm" -o -name "*.html" in a sub-shell... Another solution would have been to put parentheses between double quotes or single quotes, but a backslash here is preferable as we only have to isolate one character.

And finally, there is the command to be executed for each file:

-exec ln {} /var/www/obsolete \;

Here too, you have to escape the ; from the shell, because otherwise the shell interprets it as a command separator. If you happen to forget, find will complain that -exec is missing an argument.

A last example: you have a huge directory (/shared/images) containing all kinds of images. Regularly, you use the touch command to update the times of a file named stamp in this directory, so that you have a time reference. You want to find all JPEG images in it which are newer than the stamp file, but because you got the images from various sources, these files have extensions jpg, jpeg, JPG or JPEG. You also want to avoid searching in the old directory. You want this file list to be mailed to you, and your user name is peter:

find /shared/images -cnewer     \
     /shared/images/stamp       \
     -a -iregex ".*\.jpe?g"     \
     -a -not -regex ".*/old/.*" \
       | mail peter -s "New images"

Of course, this command is not very useful if you have to type it each time, and you would like it to be executed regularly. A simple way to have the command run periodically is:



[17] Note that this example requires that /var/www and /var/www/obsolete be on the same file system!