![]() |
NetCDF 4.8.0
|
This document describes the internal workings of the inmemory features of the netcdf-c library. The companion document to this – inmemory.md – describes the "external" operation of the inmemory features.
This document describes how the in-memory operation is implemented both for netcdf-3 files and for netcdf-4 files.
Both the netcdf-3 and netcdf-4 implementations assume that they are initially given a (pointer,size) pair representing a chunk of allocated memory of specified size.
If a file is being created instead of opened, then only the size is needed and the netcdf-c library will internally allocate the corresponding memory chunk.
If NC_DISKLESS is being used, then a chunk of memory is allocated whose size is the same as the length of the file, and the contents of the file is then read into that chunk of memory.
This information is in general represented by the following struct (see include/netcdf_mem.h).
The flags field describes properties and constraints to be applied to the given memory. At the moment, only this one flag is defined.
If this flag is set, then the netcdf library will ensure that the original allocated memory is locked
, which means that it will never be realloc'd nor free'd. Note that this flag is ignored when creating a memory file: it is only relevant when opening a pre-allocated chunk of memory via the nc_open_mem function.
Note that this flag does not prevent the memory from being modified. If there is room, then the memory may be modified in place. If the size of the memory needs to be increased and the this flag is set, then the operation will fail.
When the nc_close_memio function is called instead of nc_close, then the currently allocated memory (and its size) is returned. If the NC_MEMIO_LOCKED flag is set, then it should be the case that the chunk of memory returned is the same as originally provided. However, the size may be different because it represents the amount of memory that contains meaningful data; this value may be less than the original provided size. The actual allocated size for the memory chunk is the same as originally provided, so it that value is needed, then the caller must save it somewhere.
Note also that ownership of the memory chunk is given to the caller, and it is the caller's responsibility to free the memory.
The implementation of in-memory support for netcdf-4 files is quite complicated.
The netCDF-4 implementation relies on the HDF5 library. In order to implement in-memory storage of data, the HDF5 core driver is used to manage underlying storage of the netcdf-c file.
An HDF5 driver is an abstract interface that allows different underlying storage implementations. So there is a standard file driver as well as a core driver, which uses memory as the underlying storage.
Generically, the memory is referred to as a file image [1].
The primary API for in-memory operations is in the file libhdf5/nc4mem.c and the defined functions are described in the next sections
The signature is:
Basically, this function sets up the necessary state information to use the HDF5 core driver. It obtains the memory chunk and size from the h5->mem.memio field.
Specifically, this function converts the NC_MEMIO_LOCKED flag into using the HDF5 image specific flags: H5LT_FILE_IMAGE_DONT_COPY and H5LT_FILE_IMAGE_DONT_RELEASE. It then invokes the function libhdf5/nc4memcb/NC4_image_init function to do the necessary HDF5 specific setup.
The signature is:
This function sets up the necessary state information to use the HDF5 core driver, but for a newly created file. It initializes the memory chunk and size in the h5->mem.memio field from the initialsize argument and it leaves the memory chunk pointer NULL. It ignores the NC_MEMIO_LOCKED flag. It then invokes the function libhdf5/nc4memcb/NC4_image_init function to do the necessary HDF5 specific setup.
When a file is closed, this function is invoked. As part of its operation, and if the file is an in-memory file, it does one of two things.
The HDF5 core driver uses an abstract interface for managing the allocation and free'ing of memory. This interface is defined as a set of callback functions [2] that implement the functions of this struct.
The udata field at the end defines any extra state needed by the functions. Each function is passed the udata as its last argument. The structure of the udata is arbitrary, and is passed as void* to the functions.
The udata structure and callback functions used by the netcdf-c library are defined in the file libhdf5/nc4memcb.c. Setup is defined by the function NC4_image_init in that same file.
The udata structure used by netcdf is as follows.
It is necessary to understand one more point about the callback functions. The first four take an argument of type H5_file_image_op_t – the operator. This is an enumeration that indicates additional context about the purpose for which the callback is being invoked. For the purposes of the netcdf-4 implementation, only the following operators are used.
As can be seen, basically the operators indicate if the operation is with respect to an HDF5 property list, or with respect to a file (i.e. a core image in this case). For each callback described below, the per-operator actions will be described. Not all operators are used with all callbacks.
Internally, the HDF5 core driver thinks it is doing the following:
It turns out that for propertly lists, realloc is never called. However the HDF5 core driver follows all of the above steps.
The following sections describe the callback function operation.
This function is called to allocated an internal chunk of memory so the original provided memory is no longer needed. In order to implement the netcdf-c semantics, we modify this behavior.
We assume that the property list image info will never need to be modified, so we just copy the incoming buffer info (the app_image fields) into the fapl_image fields.
Basically just return the fapl_image_ptr field, so no actual copying.
Basically just return the fapl_image_ptr field, so no actual copying or malloc needed.
Since we always start by using the original incoming image buffer, we just need to store that pointer and size into the vfd_image fields (remember, vfd is that used by the core driver).
This function is supposed to be used to copy the incoming buffer into an internally malloc'd buffer. Since we use the original buffer, no memcpy is actually needed. As a safety check, we do actually do a memcpy if, for some reason, the src and dest arguments are different. In practice, this never happens.
Since the property list image is never realloc'd this is only called with H5FD_FILE_IMAGE_OP_FILE_RESIZE.
If the memory is not locked (i.e. the NC_MEMIO_LOCKED flag was not used), then we are free to realloc the vfd_ptr. But if the memory is locked, then we cannot realloc and we must fake it as follows:
There is one important complication. It turns out that the image_realloc callback is sometimes called with a ptr argument value of NULL. This assumes that if realloc is called with a NULL buffer pointer, then it acts like malloc. Since we have found that some systems to do not implement this, we implement it in our local_image_realloc code and do a malloc instead of realloc.
This function is, of course, invoked to deallocate memory. It is only invoked with the H5FD_FILE_IMAGE_OP_PROPERTY_LIST_CLOSE and H5FD_FILE_IMAGE_OP_FILE_CLOSE operators.
For the way the netcdf library uses it, it should still be the case that the fapl pointer is same as original incoming app_ptr, so we do not need to do anything for this operator.
Since in our implementation, we maintain control of the memory, this case will never free any memory, but may save a pointer to the current vfd memory so it can be returned to the original caller, if they want it. Specifically the vfd_image_ptr and vfd_image_size are always copied to the udata->h5->mem.memio field so they can be referenced by higher level code.
Our version of this function only manipulates the reference count.
Our version of this function only manipulates the reference count.
The netcdf-3 code – in libsrc – has its own, internal storage management API as defined in the file libsrc/ncio.h. It implements the API in the form of a set of function pointers as defined in the structure struct ncio. These function have the following signatures and semantics.
The NC_INMEMORY semantics are implemented by creating an implementation of the above functions specific for handling in-memory support. This is implemented in the file libsrc/memio.c.
Open and close related functions exist in memio.c that are not specifically part of the API. These functions are defined in the following sections.
Signature:
Create a new file. Invoke memio_new to create the ncio instance. If it is intended that the resulting file be persisted to the file system, then verify that writing such a file is possible. Also create an initial in-memory buffer to hold the file data. Otherwise act like e.g. posixio_create.
Signature:
Open an existing file. Invoke memio_new to create the ncio instance. If it is intended that the resulting file be persisted to the file system, then verify that writing such a file is possible. Also create an initial in-memory buffer to hold the file data. Read the contents of the existing file into the allocated memory. Otherwise act like e.g. posixio_open.
Signature:
This function is called as part of the NC3_close function in the event that the user wants the final in-memory chunk returned to them via nc_close_mem. It captures the existing in-memory chunk and returns it. At this point, memio will no longer have access to that memory.
The semantic interaction of the above API and NC_INMEMORY are described in the following sections.
Just unlock the in-memory chunk.
First guarantee that the requested region exists, and if necessary, realloc to make it exist. If realloc is needed, and the file is locked, then fail.
First guarantee that the requested destination region exists, and if necessary, realloc to make it exist. If realloc is needed, and the file is locked, then fail.
This is a no-op as far as memio is concerned.
This may realloc the allocated in-memory buffer to achieve padding rounded up to the pagesize.
This just returns the used size of the in-memory chunk. Note that the allocated size might be larger.
If the usere wants the contents persisted, then write out the used portion of the in-memory chunk to the target file. Then, if the in-memory chunk is not locked, or for some reason has been modified, go ahead and free that memory.
Author: Dennis Heimbigner
Email: dmh at ucar dot edu
Initial Version: 8/28/2018
Last Revised: 8/28/2018