find is a long-standing UNIX utility. Its role is to recursively scan one or more directories and find files which match a certain set of criteria in those directories. Even though it is very useful, the syntax is truly obscure, and using it requires a little work. The general syntax is:
find [options] [directories] [criterion] [action] |
If you do not specify any directory, find will search the current directory. If you do not specify criteria, this is equivalent to “true”, thus all files will be found. The options, criteria and actions are so numerous that we will only mention a few of each here. Let's start with options:
-xdev: do not search on directories located on other file systems
-mindepth <n>: descend at least <n> levels below the specified directory before searching for files
-maxdepth <n>: search for files which are located at most n levels below the specified directory
-follow: follow symbolic links if they link to directories. By default, find does not follow them
-daystart: when using tests related to time (see below), take the beginning of current day as a timestamp instead of the default (24 hours before current time).
A criteria can be one or more of several atomic tests. Some useful tests are:
-type <type>: search for a given type of file. <type> can be one of: f (regular file), d (directory), l (symbolic link), s (socket), b (block mode file), c (character mode file) or p (named pipe).
-name <pattern>: find files whose names match the given <pattern>. With this option, <pattern> is treated as a shell globbing pattern (see chapter the section called “Shell Globbing Patterns”);
-atime <n>, -amin <n>: find files that have last been accessed <n> days ago (-atime) or <n> minutes ago (-amin). You can also specify +<n> or -<n>, in which case the search will be done for files accessed at most or at least <n> days/minutes ago;
-anewer <file>: find files which have been accessed more recently than file <file>
-ctime <n>, -cmin <n>, -cnewer <file>: same as for -atime, -amin and -anewer, but applies to the last time that the contents of the file were modified
-regex <pattern>: same as -name, but pattern is treated as a regular expression
There are many other tests, refer to find(1) for more details. To combine tests, you can use one of:
<c1> -a <c2>: true if both <c1> and <c2> are true; -a is implicit, therefore you can type <c1> <c2> <c3> ... if you want all tests <c1>, <c2>, ... to match
<c1> -o <c2>: true if either <c1> or <c2> are true, or both. Note that -o has a lower precedence than -a, therefore if you want to match files which match criteria <c1> or <c2> and match criterion <c3>, you will have to use parentheses and write ( <c1> -o <c2> ) -a <c3>. You must escape (deactivate) parentheses, as otherwise they will be interpreted by the shell!
-not <c1>: inverts test <c1>, therefore -not <c1> is true if <c1> is false.
Finally, you can specify an action for each file found. The most frequently used are:
-print: just prints the name of each file on standard output. This is the default action.
-ls: prints on the standard output the equivalent of ls -ilds for each file found.
-exec <command>: execute command <command> on each file found. The command line <command> must end with a ;, which you must escape so that the shell does not interpret it; the file position is marked with {}. See the usage examples.
-ok <command>: same as -exec but ask confirmation for each command.
The best way to consolidate all of the options and parameters is with some examples. Let's say you want to find all directories in the /usr/share directory. You would type:
find /usr/share -type d |
Suppose you have an HTTP server, all your HTML files are in /var/www/html, which is also your current directory. You want to find all files whose contents have not been modified for a month. Because you have pages from several writers, some files have the html extension and some have the htm extension. You want to link these files in directory /var/www/obsolete. You would type[17]:
find \( -name "*.htm" -o -name "*.html" \) -a -ctime -30 \ -exec ln {} /var/www/obsolete \; |
This is a fairly complex example, and requires a little explanation. The criterion is this:
\( -name "*.htm" -o -name "*.html" \) -a -ctime -30 |
which does what we want: it finds all files whose names end either in .htm or .html “ \( -name "*.htm" -o -name "*.html" \) ”, and (-a) which have not been modified in the last 30 days, which is roughly a month (-ctime -30). Note the parentheses: they are necessary here, because -a has a higher precedence. If there weren't any, all files ending with .htm would have been found, plus all files ending with .html and which haven't been modified for a month, which is not what we want. Also note that parentheses are escaped from the shell: if we had put ( .. ) instead of \( .. \), the shell would have interpreted them and tried to execute -name "*.htm" -o -name "*.html" in a sub-shell... Another solution would have been to put parentheses between double quotes or single quotes, but a backslash here is preferable as we only have to isolate one character.
And finally, there is the command to be executed for each file:
-exec ln {} /var/www/obsolete \; |
Here too, you have to escape the ; from the shell, because otherwise the shell interprets it as a command separator. If you happen to forget, find will complain that -exec is missing an argument.
A last example: you have a huge directory (/shared/images) containing all kinds of images. Regularly, you use the touch command to update the times of a file named stamp in this directory, so that you have a time reference. You want to find all JPEG images in it which are newer than the stamp file, but because you got the images from various sources, these files have extensions jpg, jpeg, JPG or JPEG. You also want to avoid searching in the old directory. You want this file list to be mailed to you, and your user name is peter:
find /shared/images -cnewer \ /shared/images/stamp \ -a -iregex ".*\.jpe?g" \ -a -not -regex ".*/old/.*" \ | mail peter -s "New images" |
Of course, this command is not very useful if you have to type it each time, and you would like it to be executed regularly. A simple way to have the command run periodically is:
[17] Note that this example requires that /var/www and /var/www/obsolete be on the same file system!