Creating a Subversion repository is an incredibly simple task. The svnadmin utility, provided with Subversion, has a subcommand for doing just that. To create a new repository, just run:
$ svnadmin create /path/to/repos
This creates a new repository in the directory
/path/to/repos
. This new repository begins
life at revision 0, which is defined to consist of nothing but
the top-level root (/
) filesystem
directory. Initially, revision 0 also has a single revision
property, svn:date
, set to the time at which
the repository was created.
Do not create your repository on a network share—it cannot exist on a remote filesystem such as NFS, AFS, or Windows SMB. Berkeley DB requires that the underlying filesystem implement strict POSIX locking semantics, and more importantly, the ability to map files directly into process memory. Almost no network filesystems provide these features. If you attempt to use Berkeley DB on a network share, the results are unpredictable—you may see mysterious errors right away, or it may be months before you discover that your repository database is subtly corrupted.
If you need multiple computers to access the repository, you should set up a server process (such as Apache or svnserve), store the repository on a local filesystem which the server can access, and make the repository available over a network. Chapter 6, Server Configuration covers this process in detail.
You may have noticed that the path argument to
svnadmin was just a regular filesystem path
and not a URL like the svn client program
uses when referring to repositories. Both
svnadmin and svnlook are
considered server-side utilities—they are used on the
machine where the repository resides to examine or modify
aspects of the repository, and are in fact unable to perform
tasks across a network. A common mistake made by Subversion
newcomers is trying to pass URLs (even “local”
file:
ones) to these two programs.
So, after you've run the svnadmin create command, you have a shiny new Subversion repository in its own directory. Let's take a peek at what is actually created inside that subdirectory.
$ ls repos conf/ dav/ db/ format hooks/ locks/ README.txt
With the exception of the README.txt
and
format
files,
the repository directory is a collection of subdirectories. As
in other areas of the Subversion design, modularity is given
high regard, and hierarchical organization is preferred to
cluttered chaos. Here is a brief description of all of
the items you see in your new repository directory:
A directory containing repository configuration files.
A directory provided to Apache and mod_dav_svn for their private housekeeping data.
The main Berkeley DB environment, full of DB tables that comprise the data store for Subversion's filesystem (where all of your versioned data resides).
A file whose contents are a single integer value that dictates the version number of the repository layout.
A directory full of hook script templates (and hook scripts themselves, once you've installed some).
A directory for Subversion's repository locking data, used for tracking accessors to the repository.
A file which merely informs its readers that they are looking at a Subversion repository.
In general, you shouldn't tamper with your repository “by hand”. The svnadmin tool should be sufficient for any changes necessary to your repository, or you can look to third-party tools (such as Berkeley DB's tool suite) for tweaking relevant subsections of the repository. Some exceptions exist, though, and we'll cover those here.
A hook is a program triggered by some repository event, such as the creation of a new revision or the modification of an unversioned property. Each hook is handed enough information to tell what that event is, what target(s) it's operating on, and the username of the person who triggered the event. Depending on the hook's output or return status, the hook program may continue the action, stop it, or suspend it in some way.
The hooks
subdirectory is, by
default, filled with templates for various repository
hooks.
$ ls repos/hooks/ post-commit.tmpl pre-revprop-change.tmpl post-revprop-change.tmpl start-commit.tmpl pre-commit.tmpl
There is one template for each hook that the Subversion
repository implements, and by examining the contents of those
template scripts, you can see what triggers each such script
to run and what data is passed to that script. Also present
in many of these templates are examples of how one might use
that script, in conjunction with other Subversion-supplied
programs, to perform common useful tasks. To actually install
a working hook, you need only place some executable program or
script into the repos/hooks
directory
which can be executed as the name (like
start-commit or
post-commit) of the hook.
On Unix platforms, this means supplying a script or
program (which could be a shell script, a Python program, a
compiled C binary, or any number of other things) named
exactly like the name of the hook. Of course, the template
files are present for more than just informational
purposes—the easiest way to install a hook on Unix
platforms is to simply copy the appropriate template file to a
new file that lacks the .tmpl
extension,
customize the hook's contents, and ensure that the script is
executable. Windows, however, uses file extensions to
determine whether or not a program is executable, so you would
need to supply a program whose basename is the name of the
hook, and whose extension is one of the special extensions
recognized by Windows for executable programs, such as
.exe
or .com
for
programs, and .bat
for batch
files.
Currently there are five hooks implemented by the Subversion repository:
start-commit
This is run before the commit transaction is even created. It is typically used to decide if the user has commit privileges at all. The repository passes two arguments to this program: the path to the repository, and username which is attempting the commit. If the program returns a non-zero exit value, the commit is stopped before the transaction is even created. If the hook program writes data to stderr, it will be marshalled back to the client.
pre-commit
This is run when the transaction is complete, but before it is committed. Typically, this hook is used to protect against commits that are disallowed due to content or location (for example, your site might require that all commits to a certain branch include a ticket number from the bug tracker, or that the incoming log message is non-empty). The repository passes two arguments to this program: the path to the repository, and the name of the transaction being committed. If the program returns a non-zero exit value, the commit is aborted and the transaction is removed. If the hook program writes data to stderr, it will be marshalled back to the client.
The Subversion distribution includes some access
control scripts (located in the
tools/hook-scripts
directory of the
Subversion source tree) that can be called from
pre-commit to implement fine-grained
write-access control. Another option is to use the
mod_authz_svn Apache httpd module,
which provides both read and write access control on
individual directories (see the section called “Per-Directory Access Control”). In a future version
of Subversion, we plan to implement access control lists
(ACLs) directly in the filesystem.
post-commit
This is run after the transaction is committed, and a new revision is created. Most people use this hook to send out descriptive emails about the commit or to make a backup of the repository. The repository passes two arguments to this program: the path to the repository, and the new revision number that was created. The exit code of the program is ignored.
The Subversion distribution includes a
commit-email.pl script (located in
the tools/hook-scripts/
directory
of the Subversion source tree) that can be used to send
email with (and/or append to a log file) a description
of a given commit. This mail contains a list of the
paths that were changed, the log message attached to the
commit, the author and date of the commit, as well as a
GNU diff-style display of the changes made to the
various versioned files as part of the commit.
Another useful tool provided by Subversion is the
hot-backup.py script (located in the
tools/backup/
directory of the
Subversion source tree). This script performs hot
backups of your Subversion repository (a feature
supported by the Berkeley DB database back-end), and can
be used to make a per-commit snapshot of your repository
for archival or emergency recovery purposes.
pre-revprop-change
Because Subversion's revision properties are not
versioned, making modifications to such a property (for
example, the svn:log
commit message
property) will overwrite the previous value of that
property forever. Since data can be potentially lost
here, Subversion supplies this hook (and its
counterpart, post-revprop-change
)
so that repository administrators can keep records of
changes to these items using some external means if
they so desire. As a precaution against losing
unversioned property data, Subversion clients will not
be allowed to remotely modify revision properties at all
unless this hook is implemented for your repository.
This hook runs just before such a modification is made to the repository. The repository passes four arguments to this hook: the path to the repository, the revision on which the to-be-modified property exists, the authenticated username of the person making the change, and the name of the property itself.
post-revprop-change
As mentioned earlier, this hook is the counterpart
of the pre-revprop-change
hook. In
fact, for the sake of paranoia this script will not run
unless the pre-revprop-change
hook
exists. When both of these hooks are present, the
post-revprop-change
hook runs just
after a revision property has been changed, and is
typically used to send an email containing the new value
of the changed property. The repository passes four
arguments to this hook: the path to the repository, the
revision on which the property exists, the authenticated
username of the person making the change, and the name of
the property itself.
The Subversion distribution includes a
propchange-email.pl script (located
in the tools/hook-scripts/
directory of the Subversion source tree) that can be
used to send email with (and/or append to a log file)
the details of a revision property change. This mail
contains the revision and name of the changed property,
the user who made the change, and the new property
value.
Subversion will attempt to execute hooks as the same user who owns the process which is accessing the Subversion repository. In most cases, the repository is being accessed via Apache HTTP server and mod_dav_svn, so this user is the same user that Apache runs as. The hooks themselves will need to be configured with OS-level permissions that allow that user to execute them. Also, this means that any file or programs (including the Subversion repository itself) accessed directly or indirectly by the hook will be accessed as the same user. In other words, be alert to potential permission-related problems that could prevent the hook from performing the tasks you've written it to perform.
A Berkeley DB environment is an encapsulation of one or more databases, log files, region files and configuration files. The Berkeley DB environment has it own set of default configuration values for things like the number of locks allowed to be taken out at any given time, or the maximum size of the journaling log files, etc. Subversion's filesystem code additionally chooses default values for some of the Berkeley DB configuration options. However, sometimes your particular repository, with its unique collection of data and access patterns, might require a different set of configuration option values.
The folks at Sleepycat (the producers of Berkeley DB)
understand that different databases have different
requirements, and so they have provided a mechanism for
overriding at runtime many of the configuration values for the
Berkeley DB environment. Berkeley checks for the presence of
a file named DB_CONFIG
in each
environment directory, and parses the options found in that
file for use with that particular Berkeley environment.
The Berkeley configuration file for your repository is
located in the db
environment directory,
at repos/db/DB_CONFIG
. Subversion itself
creates this file when it creates the rest of the repository.
The file initially contains some default options, as well as
pointers to the Berkeley DB online documentation so you can
read about what those options do. Of course, you are free to
add any of the supported Berkeley DB options to your
DB_CONFIG
file. Just be aware that while
Subversion never attempts to read or interpret the contents of
the file, and makes no use of the option settings in it,
you'll want to avoid any configuration changes that may cause
Berkeley DB to behave in a fashion that is unexpected by the
rest of the Subversion code. Also, changes made to
DB_CONFIG
won't take effect until you
recover the database environment (using svnadmin
recover).