Developing applications against the Subversion library APIs
is fairly straightforward. All of the public header files live
in the subversion/include
directory of the
source tree. These headers are copied into your system
locations when you build and install Subversion itself from
source. These headers represent the entirety of the functions
and types meant to be accessible by users of the Subversion
libraries.
The first thing you might notice is that Subversion's
datatypes and functions are namespace protected. Every public
Subversion symbol name begins with svn_
,
followed by a short code for the library in which the symbol is
defined (such as wc
,
client
, fs
, etc.),
followed by a single underscore (_
) and
then the rest of the symbol name. Semi-public functions (used
among source files of a given library but not by code outside
that library, and found inside the library directories
themselves) differ from this naming scheme in that instead of a
single underscore after the library code, they use a double
underscore (__
). Functions that are private
to a given source file have no special prefixing, and are declared
static
. Of course, a compiler isn't
interested in these naming conventions, but they help to clarify
the scope of a given function or datatype.
Along with Subversion's own datatype, you will see many
references to datatypes that begin with
apr_
—symbols from the Apache
Portable Runtime (APR) library. APR is Apache's portability
library, originally carved out of its server code as an
attempt to separate the OS-specific bits from the
OS-independent portions of the code. The result was a library
that provides a generic API for performing operations that
differ mildly—or wildly—from OS to OS. While the
Apache HTTP Server was obviously the first user of the APR
library, the Subversion developers immediately recognized the
value of using APR as well. This means that there are
practically no OS-specific code portions in Subversion itself.
Also, it means that the Subversion client compiles and runs
anywhere that the server does. Currently this list includes
all flavors of Unix, Win32, BeOS, OS/2, and Mac OS X.
In addition to providing consistent implementations of
system calls that differ across operating systems,
[35]
APR gives Subversion immediate access to many custom
datatypes, such as dynamic arrays and hash tables. Subversion
uses these types extensively throughout the codebase. But
perhaps the most pervasive APR datatype, found in nearly every
Subversion API prototype, is the
apr_pool_t
—the APR memory pool.
Subversion uses pools internally for all its memory allocation
needs (unless an external library requires a different memory
management schema for data passed through its API),
[36]
and while a person coding against the Subversion APIs is
not required to do the same, they are required to provide
pools to the API functions that need them. This means that
users of the Subversion API must also link against APR, must
call apr_initialize()
to initialize the
APR subsystem, and then must acquire a pool for use with
Subversion API calls. See the section called “Programming with Memory Pools”
for more information.
With remote version control operation as the whole point
of Subversion's existence, it makes sense that some attention
has been paid to internationalization (i18n) support. After
all, while “remote” might mean “across the
office”, it could just as well mean “across the
globe.” To facilitate this, all of Subversion's public
interfaces that accept path arguments expect those paths to be
canonicalized, and encoded in UTF-8. This means, for example,
that any new client binary that drives the libsvn_client
interface needs to first convert paths from the
locale-specific encoding to UTF-8 before passing those paths
to the Subversion libraries, and then re-convert any resultant
output paths from Subversion back into the locale's encoding
before using those paths for non-Subversion purposes.
Fortunately, Subversion provides a suite of functions (see
subversion/include/svn_utf.h
) that can be
used by any program to do these conversions.
Also, Subversion APIs require all URL parameters to be
properly URI-encoded. So, instead of passing file:///home/username/My File.txt
as
the URL of a file named My File.txt
, you
need to pass file:///home/username/My%20File.txt
.
Again, Subversion supplies helper functions that your
application can
use—svn_path_uri_encode
and
svn_path_uri_decode
, for URI encoding and
decoding, respectively.
If you are interested in using the Subversion libraries in
conjunction with something other than a C program—say a
Python script or Java application—Subversion has some
initial support for this via the Simplified Wrapper and
Interface Generator (SWIG). The SWIG bindings for Subversion
are located in subversion/bindings/swig
and are slowly maturing into a usable state. These bindings
allow you to call Subversion API functions indirectly, using
wrappers that translate the datatypes native to your
scripting language into the datatypes needed by Subversion's
C libraries.
There is an obvious benefit to accessing the Subversion APIs via a language binding—simplicity. Generally speaking, languages such as Python and Perl are much more flexible and easy to use than C or C++. The sort of high-level datatypes and context-driven type checking provided by these languages are often better at handling information that comes from users. As you know, humans are proficient at botching up input to a program, and scripting languages tend to handle that misinformation more gracefully. Of course, often that flexibility comes at the cost of performance. That is why using a tightly-optimized, C-based interface and library suite, combined with a powerful, flexible binding language, is so appealing.
Let's look at an example that uses Subversion's Python SWIG bindings. Our example will do the same thing as our last example. Note the difference in size and complexity of the function this time!
Example 8.2. Using the Repository Layer with Python
from svn import fs import os.path def crawl_filesystem_dir (root, directory, pool): """Recursively crawl DIRECTORY under ROOT in the filesystem, and return a list of all the paths at or below DIRECTORY. Use POOL for all allocations.""" # Get the directory entries for DIRECTORY. entries = fs.dir_entries(root, directory, pool) # Initialize our returned list with the directory path itself. paths = [directory] # Loop over the entries names = entries.keys() for name in names: # Calculate the entry's full path. full_path = os.path.join(basepath, name) # If the entry is a directory, recurse. The recursion will return # a list with the entry and all its children, which we will add to # our running list of paths. if fs.is_dir(fsroot, full_path, pool): subpaths = crawl_filesystem_dir(root, full_path, pool) paths.extend(subpaths) # Else, it is a file, so add the entry's full path to the FILES list. else: paths.append(full_path) return paths
An implementation in C of the previous example would stretch on quite a bit longer. The same routine in C would need to pay close attention to memory usage, and need to use custom datatypes for representing the hash of entries and the list of paths. Python has hashes (called “dictionaries”) and lists as built-in datatypes, and provides a wonderful selection of methods for operating on those types. And since Python uses reference counting and garbage collection, users of the language don't have to bother themselves with allocating and freeing memory.
In the previous section of this chapter, we mentioned the
libsvn_client
interface, and how it
exists for the sole purpose of simplifying the process of
writing a Subversion client. The following is a brief example
of how that library can be accessed via the SWIG bindings. In
just a few lines of Python, you can check out a fully
functional Subversion working copy!
Example 8.3. A Simple Script to Check Out a Working Copy.
#!/usr/bin/env python import sys from svn import util, _util, _client def usage(): print "Usage: " + sys.argv[0] + " URL PATH\n" sys.exit(0) def run(url, path): # Initialize APR and get a POOL. _util.apr_initialize() pool = util.svn_pool_create(None) # Checkout the HEAD of URL into PATH (silently) _client.svn_client_checkout(None, None, url, path, -1, 1, None, pool) # Cleanup our POOL, and shut down APR. util.svn_pool_destroy(pool) _util.apr_terminate() if __name__ == '__main__': if len(sys.argv) != 3: usage() run(sys.argv[1], sys.argv[2])
Subversion's language bindings unfortunately tend to lack
the level of attention given to the core Subversion modules.
However, there have been significant efforts towards creating
functional bindings for Python, Perl, and Java. Once you have
the SWIG interface files properly configured, generation of
the specific wrappers for all the supported SWIG languages
(which currently includes versions of C#, Guile, Java,
MzScheme, OCaml, Perl, PHP, Python, Ruby, and Tcl) should
theoretically be trivial. Still, some extra programming is
required to compensate for complex APIs that SWIG needs some
help generalizing. For more information on SWIG itself, see
the project's website at http://www.swig.org/
.