netCDF
4.2.1.1
|
In the 4.0.1 release, the default chunk sizes were chosen with a different scheme, as demonstrated in the following C code:
/* These are limits for default chunk sizes. (2^16 and 2^20). */ #define NC_LEN_TOO_BIG 65536 #define NC_LEN_WAY_TOO_BIG 1048576 /* Now we must determine the default chunksize. */ if (dim->unlimited) chunksize[d] = 1; else if (dim->len < NC_LEN_TOO_BIG) chunksize[d] = dim->len; else if (dim->len > NC_LEN_TOO_BIG && dim->len <= NC_LEN_WAY_TOO_BIG) chunksize[d] = dim->len / 2 + 1; else chunksize[d] = NC_LEN_WAY_TOO_BIG;
As can be seen from this code, the default chunksize is 1 for unlimited dimensions, otherwise it is the full length of the dimension (if it is under NC_LEN_TOO_BIG), or half the size of the dimension (if it is between NC_LEN_TOO_BIG and NC_LEN_WAY_TOO_BIG), and, if it's longer than NC_LEN_WAY_TOO_BIG, it is set to NC_LEN_WAY_TOO_BIG.
Our experience is that these defaults work well for small data sets, but once variable size reaches the GB range, the user is better off determining chunk sizes for their read access patterns.
In particular, the idea of using 1 for the chunksize of an unlimited dimension works well if the data are being read a record at a time. Any other read access patterns will result in slower performance.