NiBabel

Access a cacophony of neuro-imaging file formats

Table Of Contents

Previous topic

Relationship between images and io implementations

Next topic

A guide to making a nibabel release

This Page

Reggie -- the one

Design of data packages for the nipy suite

See Principles of data package for a more general discussion of design issues.

When developing or using nipy, many data files can be useful. We divide the data files nipy uses into at least 3 categories

  1. test data - data files required for routine code testing
  2. template data - data files required for algorithms to function, such as templates or atlases
  3. example data - data files for running examples, or optional tests

Files used for routine testing are typically very small data files. They are shipped with the software, and live in the code repository. For example, in the case of nipy itself, there are some test files that live in the module path nipy.testing.data.

template data and example data are example of data packages. What follows is a discussion of the design and use of data packages.

Use cases for data packages

Using the data package

The programmer will want to use the data something like this:

from nibabel.data import make_datasource

templates = make_datasource('nipy', 'templates')
fname = templates.get_filename('ICBM152', '2mm', 'T1.nii.gz')

where fname will be the absolute path to the template image ICBM152/2mm/T1.nii.gz.

The programmer can insist on a particular version of a datasource:

if templates.version < '0.4':
   raise ValueError('Need datasource version at least 0.4')

If the repository cannot find the data, then:

>>> make_datasource('nipy', 'implausible')
Traceback
 ...
nibabel.data.DataError

where DataError gives a helpful warning about why the data was not found, and how it should be installed.

Warnings during installation

The example data and template data may be important, and it would be useful to warn the user if NIPY cannot find either of the two sets of data when installing the package. Thus:

python setup.py install

will import nipy after installation to check whether these raise an error:

>>> from nibabel.data import make_datasource
>>> template = make_datasource('nipy', 'templates')
>>> example_data = make_datasource('nipy', 'data')

and warn the user accordingly, with some basic instructions for how to install the data.

Finding the data

The routine make_datasource will need to be able to find the data that has been installed. For the following call:

>>> templates = make_datasource('nipy', 'templates')

We propose to:

  1. Get a list of paths where data is known to be stored with nipy.data.get_data_path()
  2. For each of these paths, search for directory nipy/templates. If found, and of the correct format (see below), return a datasource, otherwise raise an Exception

The paths collected by nipy.data.get_data_paths() will be constructed from ‘:’ (Unix) or ‘;’ separated strings. The source of the strings (in the order in which they will be used in the search above) are:

  1. The value of the NIPY_DATA_PATH environment variable, if set
  2. A section = DATA, parameter = path entry in a config.ini file in nipy_dir where nipy_dir is $HOME/.nipy or equivalent.
  3. Section = DATA, parameter = path entries in configuration .ini files, where the .ini files are found by glob.glob(os.path.join(etc_dir, '*.ini') and etc_dir is /etc/nipy on Unix, and some suitable equivalent on Windows.
  4. The result of os.path.join(sys.prefix, 'share', 'nipy')
  5. If sys.prefix is /usr, we add /usr/local/share/nipy. We need this because Python 2.6 in Debian / Ubuntu does default installs to /usr/local.
  6. The result of get_nipy_user_dir()

Requirements for a data package

To be a valid NIPY project data package, you need to satisfy:

  1. The installer installs the data in some place that can be found using the method defined in Finding the data.

We recommend that:

  1. By default, you install data in a standard location such as <prefix>/share/nipy where <prefix> is the standard Python prefix obtained by >>> import sys; print sys.prefix

Remember that there is a distinction between the NIPY project - the umbrella of neuroimaging in python - and the NIPY package - the main code package in the NIPY project. Thus, if you want to install data under the NIPY package umbrella, your data might go to /usr/share/nipy/nipy/packagename (on Unix). Note nipy twice - once for the project, once for the pacakge. If you want to install data under - say - the `pbrain package umbrella, that would go in /usr/share/nipy/pbrain/packagename.

Data package format

The following tree is an example of the kind of pattern we would expect in a data directory, where the nipy-data and nipy-templates packages have been installed:

<ROOT>
`-- nipy
    |-- data
    |   |-- config.ini
    |   `-- placeholder.txt
    `-- templates
        |-- ICBM152
        |   `-- 2mm
        |       `-- T1.nii.gz
        |-- colin27
        |   `-- 2mm
        |       `-- T1.nii.gz
        `-- config.ini

The <ROOT> directory is the directory that will appear somewhere in the list from nipy.data.get_data_path(). The nipy subdirectory signifies data for the nipy package (as opposed to other NIPY-related packages such as pbrain). The data subdirectory of nipy contains files from the nipy-data package. In the nipy/data or nipy/templates directories, there is a config.ini file, that has at least an entry like this:

[DEFAULT]
version = 0.2

giving the version of the data package.

Installing the data

We will use python distutils to install data packages, and the data_files mechanism to install the data. On Unix, with the following command:

python setup.py install --prefix=/my/prefix

data will go to:

/my/prefix/share/nipy

For the example above this will result in these subdirectories:

/my/prefix/share/nipy/nipy/data
/my/prefix/share/nipy/nipy/templates

because nipy is both the project, and the package to which the data relates.

If you install to a particular location, you will need to add that location to the output of nipy.data.get_data_path() using one of the mechanisms above, for example, in your system configuration:

export NIPY_DATA_PATH=/my/prefix/share/nipy

Packaging for distributions

For a particular data package - say nipy-templates - distributions will want to:

  1. Install the data in set location. The default from python setup.py install for the data packages will be /usr/share/nipy on Unix.
  2. Point a system installation of NIPY to these data.

For the latter, the most obvious route is to copy an .ini file named for the data package into the NIPY etc_dir. In this case, on Unix, we will want a file called /etc/nipy/nipy_templates.ini with contents:

[DATA]
path = /usr/share/nipy

Current implementation

This section describes how we (the nipy community) implement data packages at the moment.

The data in the data packages will not usually be under source control. This is because images don’t compress very well, and any change in the data will result in a large extra storage cost in the repository. If you’re pretty clear that the data files aren’t going to change, then a repository could work OK.

The data packages will be available at a central release location. For now this will be: http://nipy.sourceforge.net/data-packages/ .

A package, such as nipy-templates-0.2.tar.gz will have the following sort of structure:

<ROOT>
  |-- setup.py
  |-- README.txt
  |-- MANIFEST.in
  `-- templates
      |-- ICBM152
      |   |-- 1mm
      |   |   `-- T1_brain.nii.gz
      |   `-- 2mm
      |       `-- T1.nii.gz
      |-- colin27
      |   `-- 2mm
      |       `-- T1.nii.gz
      `-- config.ini

There should be only one nipy/packagename directory delivered by a particular package. For example, this package installs nipy/templates, but does not contain nipy/data.

Making a new package tarball is simply:

  1. Downloading and unpacking e.g nipy-templates-0.1.tar.gz to form the directory structure above.
  2. Making any changes to the directory
  3. Running setup.py sdist to recreate the package.

The process of making a release should be:

  1. Increment the major or minor version number in the config.ini file
  2. Make a package tarball as above
  3. Upload to distribution site

There is an example nipy data package nipy-examplepkg in the examples directory of the NIPY repository.

The machinery for creating and maintaining data packages is available at http://github.com/nipy/data-packaging

See the README.txt file there for more information.