NetCDF 4.9.3
Loading...
Searching...
No Matches
Access to Metadata Objects

The original internal representations of metadata in memory relied on linear searching of lists to locate various objects by name or by numeric id: by varid or by grpid for example.

In recent years, the flaws in that approach have become obvious as users create files with extremely large numbers of objects: groups, variables, attributes, and dimensions. One case has 14 megabytes of metadata. Creating and (especially) later opening such files was exceedingly slow.

This problem was partially alleviated in both netcdf-3 (libsrc) and netcdf-4 (libsrc4) by adding name hashing tables. However, and especially for netcdf-4, linear search still prevailed.

A pervasive change has been made to try to change searches by name or by id from O(n) to O(1). This uses hashing for name-based search and vectors for numeric id-based search. All other cases were left as O(n) searches.

This document describes the architecture and details of the netCDF internal object lookup mechanisms now in place.

Indexed Searches

There are, as a rule, two searches that are used to locate metadata objects: (1) search by name and (2) search by externally visible id (e.g. dimid or varid).

It is currently the case that after all the metadata is read or created, hashing is used for locating objects by name. In all other cases – apparently – lookup is by linear search of a of linked list.

It is relevant that, once created, no metadata object – except attributes – can be deleted. They can be renamed, but that does not change the associated structure or id. Deletion only occurs when an error occurs in creating an object or on invoking "nc_close()".

The numeric identifiers for dimensions, types, and groups are all globally unique across a file. But note that variable id's are not globally unique (IMO a bad design decision) but are only unique within the containing group. Thus, in order to provide a unique id for a variable it must be composed of the containing group id plus the variable id.

Note also that names are unique only within a group and with respect to some kind of metadata. That is a group cannot have e.g. two dimensions with the same name. But it can have a variable and a dimension with the same name (as with coordinate variables).

Finally, attribute names are unique only with respect to each other and with respect to the containing object (a variable or a group).

Basic Data Structures

The basic data structures used by the new lookup mechanisms are described in the following sections.

NClist

With rare exceptions, vectors of objects are maintained as instances of NClist, which provides a dynamically extendible vector of pointers: pointers to metadata objects in this case. It is possible to append new objects or insert at a specific vector offset, or overwrite an existing pointer at a specific offset.

The NClist structure definition is as follows.

typedef struct NClist {
size_t alloc;
size_t length;
void** content;
} NClist;

NCexhashmap

The NCexhashmap type is a hash table mapping a key to a data item. As a rule, the data item is a pointer to a metadata object. This hash map uses Fagin's extendible hashing algorithm. The current implementation supports table expansion as described for that algorithm.

The hashtable definition is as follows.

typedef struct NCexhashmap {
int depth; /* Global depth */
int nactive; /* # of active entries in whole table */
NCexleaf** directory; /* |directory| == 2^depth */
...
} NCexhashmap;

where the directory (of leaf pointers) has (1<<depth) pointers, and where nactive is the total number of entries. There are other fields not relevant to this discussion.

An entry in a leaf has this form:

typedef struct NCexentry {
ncexhashkey_t hashkey; /* Hash id */
uintptr_t data;
} NCexentry;

The data field is of type uintptr_t. Note that we assume that sizeof(unintptr_t) == sizeof(void*). This is very important. It is supposed to be the case that a value of type uintptr_t is an integer of sufficient size to hold a void* pointer.

This means that the data field can hold an unsigned integer or a void* pointer. As a pointer, it often points to an instance of a variable, or dimension, or other object.

The hashkey field is a CRC64 hash of the key. It is assumed that 64 bits is sufficient to guarantee that the hash of the key (a string typically) is always unique.

This means that comparing hashkeys should be sufficient to ensure that the corresponding keys are the same because hash collisions will not occur. If a violation of this assumption is detected, then a netcdf library internal error code is returned.

Finally, a leaf holds a vector of entries, and keeps them in hashkey sorted order. Lookup within the leaf uses binary search. This differs form Fagin's original algorithm, which used secondary hashing within the leaf.

NCxindex

An "index" (aka instance of type "NCxindex") is a combination of one NClist instance plus one NCexhashmap instance. The hashmap maps a name to the position of the correspondingly named object in the NClist part of the NCindex.

An index is used to provide several kinds of lookup with respect to a specific list of metadata objects. For example, the subgroups of a group are stored using a vector of pointers to the subgroup objects. This also provides information about creation order, which is sometimes important. However, we often need fast access to that vector by name so an NCindex object provides these capabilities. The NCindex object contains:

  1. A vector into which the object pointers can be stored and iterated over.
  2. A map from name to the corresponding object index in the vector.

Note that currently, NCxindex is only used in libsrc4 and libhdf5 and libnczarr. But if performance issues warrant, it will eventually also be used in libsrc.

Note also that alternative implementations are feasible that do not use a hash table for name indexing, but rather keep a list sorted by name and use binary search to do name-based lookup. If this alternative were implemented, then it is probable that we could get rid of using the NCexhashmap structure altogether for netcdf-4. There is a performance cost since binary search is O(log n). In practice, it is probable that this is of negligible effect. The advantage is that rename operations become considerably simpler.

Global Object Access

As mentioned, dimension, group, and type external id's (dimid, grpid, typeid) are unique across the whole file. It is therefore convenient to store in memory a per-file vector for each object type such that the external id of the object is the same as the position of that object in the corresponding per-file vector. This makes lookup by external id very efficient. Note that this was already the case for netcdf-3 (libsrc) so this is a change for libsrc4 only.

The global set of dimensions, types, and groups is maintained by three instances of NClist (within an index object) in the NC_FILE_INFO structure: namely alldims, alltypes, and allgroups. The position of the object within the corresponding list determines the object's external id. Thus, a position of a dimension object within the "alldims" field of the file structure determines its dimid. Similarly for types and groups. Note that alldims is an NClist rather than an NCxindex because the dimension names are not unique unless they are full names (i.e. /g1/g2/dim).

Per-Group Object Access

Each group object (NC_GRP_INFO_T) contains five instances of NCxindex. One is for dimensions, one is for types, one is for subgroups, one is for variables, and one is for attributes. An index is used for two reasons. First, it allows name-based lookup for these items. Second, the declaration order is maintained by the list within the index's vector. Note that the position of an object in a group index vector has no necessary relationship to the position of that object within the global vectors.

Note however that the index vector for variables does define the variable id, which is unique only within a group. In this special case, the external id for the variable is the same as its offset in the index's vector for the group.

A note about typeids. Since user defined types have an external id starting at NC_FIRSTUSERTYPEID, we leave the global type vector entries 0..NC_FIRSTUSERTYPEID-1 empty.

Metadata Object Header

Each metadata object (e.g. NC_DIM_INFO_T, NC_VAR_INFO_T) now has what is called a "hdr" object as its first field. This provides a form of pseudo-inheritance for these objects because they can all be cast to "NC_OBJ" to get common information. The structure of the header is as follows.

typedef struct NC_OBJ {
NC_SORT sort;
char* name; /* assumed to be null terminated */
size_t id;
ncexhashkey_t hashkey;
} NC_OBJ;

The sort is one of the values NCVAR, NCDIM, NCATT, NCTYP, or NCGRP. The name is assumed to be nul terminated. The id is the assigned id for the object. The hashkey is the same hash value as used in NCexhashmap.

Programming cliches

Lookup an Object by Name

In the original code, the following cliche (code snippet) was common for looking up an object by name.

NC_GRP_INFO_T* grp = ...;
...
NC_GRP_INFO_T* g;
...
for (g = grp->children; g; g = g->l.next) {
if(strcmp(name,g->name)==0) {
... code to process matching grp by name
}
...
}

In this case, this loop is iterating across all the subgroups (children) of the grp. It does so by walking the linked list of child groups. It does a name comparison in order to find the group with the desired name.

In the new code, this iteration cliche is replaced by something that looks like this.

NC_GRP_INFO_T* grp = ...;
NC_GRP_INFO_T* g;
...
g = ncxindexlookup(grp->children,name);
if(g != NULL)
... code to process matching grp by name
}

In this case, the iteration is replaced by a hashtable lookup.

Lookup an Object by id

In the original code, the following cliche (code snippet) was common for looking up an object by its id (dimid, varid, etc).

NC_GRP_INFO_T* grp = ...;
...
NC_DIM_INFO_T* d;
...
for (d = grp->dim; d; d = d->l.next) {
if(varid == d->dimid)
... code to process matching dim by index
}
...
}

In this case, this loop is iterating across all the dimension objects of the grp. It does so by walking the linked list of dimensions. It does an id comparison in order to find the group with the desired dimension.

In the new code, this iteration cliche is replaced by something that looks like this.

NC_FILE_INFO_T* h5 = ...;
NC_DIM_INFO_T* d;;
...
d = nclistget(h5->alldims,id);
if(d != NULL)
... code to process matching dim by id
}

This shows how the alldims vector is used to map from a dimid directly to the matching dimension object. In this example, h5 is the NC_FILE_INFO_T file object. This approach works for dimension ids, group ids, and type ids because they are globally unique.

For variables and attributes, we have to use the containing group's Ncxindex, such as grp->vars. In this case, the varid, is mapped using code like this.

NC_GRP_INFO_T* grp = ...;
NC_VAR_INFO_T* v;
...
v = ncxindexith(grp->vars,id);
if(v != NULL)
... code to process matching variable by id
}

Iterating over sets of objects

In the original code, the following cliche (code snippet) was common.

NC_GRP_INFO_T* grp;
...
NC_GRP_INFO_T* g;
...
for (g = grp->children; g; g = g->l.next)
...

In this case, this loop is iterating across all the subgroups (children) of the grp. It does so by walking the linked list of child groups. Similar loops are used to walk a list of dimensions, variables, types, or attributes.

In the new code, this iteration cliche is replaced by something that looks like this.

NC_GRP_INFO_T* grp;
...
for(i=0;i<ncxindexsize(grp->children);i++) {
NC_GRP_INFO_T* g = nclistith(grp->children,i);
...
}

In this case, the iteration is by index into the underlying vector.

Performance

The initial impetus for this change was to improve the performance of netcdf-4 metadata loading by replacing linear searches with O(1) searches.

In fact, this goal was not initially met. It appears to be the case that the metadata loading costs are entirely dominated by the performance of the HDF5 library. The reason for this is that the netcdf-c library used to load all the metadata immediately when a file was opened. This in turn meant that all of the metadata was immediately extracted from the underlying HDF5 file. So, there was no opportunity for lazy loading to be used.

The netcdf library has since been modified to do lazy loading of variables and attributes. A not pursued alternative would have been to store a single metadata object into the file so it could be read at one shot. This object would then be processed in-memory to construct the internal metadata. The costs for this approach are space in the file plus the need to keep it consistent with the actual metadata stored by HDF5.

It should be noted that there is an effect from this change. Using gprof, one can see that in the original code the obj_list_add function was the dominate function called by a large percentage (about 20%). Whereas with the new code, the function call distribution is much more even with no function taking more than 4-5%.

Some other observations:

  1. the utf8 code now shows up as taking about 4%. Given that most names are straight ASCII, it might pay to try to optimize for this to avoid invoking the utf8 processing code.
  2. In the new code, attribute processing appears to take up a lot of the time. This, however might be an artifact of the test cases.
  3. There is a small performance improvement from avoiding walking the linked list. It appears that creating a file is about 10% faster and opening a file is also about 10% faster.

Notes and Warning

  1. Ncxindex is currently not used for enum constants and compound fields. Additionally, it is not used for listing the dimensions associated with a variable. Small size is the reason.
  2. References between meta-data objects (e.g. group parent or containing group) are stored directly and not using any kind of vector or hashtable.
  3. Attribute rename and deletion are still moderately costly operations.
  4. As in the original code, object ids (dimid, etc) are assigned explicitly using counters within the NC_FILE_INFO_T object. When stored into, for example, "alldims", the position of the object is forcibly made to match the value of the assigned id.
  5. The file ncxindex.c has a constant, NCNOHASH, that controls if the index uses that hash table versus just searching the index's vector. This is for experimental purposes.

Contact Information

Author: Dennis Heimbigner
Initial Version: 01/10/2018
Last Revised: 10/30/2020