Skip to main content.

Summary of HDF5, Netcdf-3, Netcdf-Java-2.2, and OpenDAP-2.0 Data models

The following is an attempt to compare the models using a common terminology, at the same time indicating what the concepts are called in each data model. I am ignoring creation and writing semantics, as well as storage and implementation details. Feel free to correct and add!

Common model: All start with a collection of Variables, optionally arranged in Groups. Each Variable is a multidimensional array of elements (where a scalar is a 0-dimensional array) of the same datatype. Differences mostly show up in what dataypes are supported. All support reading subsets of a Variable's data using index ranges.

File

HDF5: an HDF5 File is a container for an organized collection of objects. It is a local file formatted in HDF5. Multiple physical files can be transparently linked together.

N3: A local File formatted in netCDF.

NJ22: A Dataset is a generalization of a netCDF file. It may be a netCDF file, an HDF5 file, or another file format which can be accessed through the netCDF API. Files may be local or remotely accessible through HTTP, or through the OpenDAP protocol.

OpenDAP: A Data Source is an online resource, accessible through the OpenDAP protocol through its URL.

Group

HDF5: An HDF5 Group contains other Groups, Datasets, and Named Datatypes. Group membership is implemented via Links, and allows an object to be linked multiple times. Groups thus form graphs ("rooted, directed graph"), allowing loops.

N3: none.

NJ22: A Group is a logical collection of Variables and nested Groups. Variables and Groups have a unique parent group, so only trees are allowed.

OpenDAP: none.

Attribute

HDF5: An Attribute is attached to a Group, Variable (aka Dataset), or Named Datatype. It can be any type as a Variable, including structures, arrays, etc, and is similar to a Variables, except that: 1) it can only be stored compactly, not compressed or chunked; it is stored in the object header and so should be small in size. 2) its data is read all at once, not subsetted. 3) an attribute cannot itself have an attribute.

N3: An Attribute has a name and a value, used for associating arbitrary metadata with a Variable, or the entire File. The value can be a scalar or a one dimensional array of chars or numeric values.

NJ22: An Attribute has a name and a value, attached to any Variable, Group, or the entire Dataset. The value can be a scalar or a one dimensional array of primitives or String

OpenDAP: Attribute has a name and a value, attached to a Variable, or the entire Dataset. The value can be a scalar or a one dimensional array of primitives or String. It can also have nested "attribute structures".

Dimension

HDF5: A Dataspace is used to define the array shape of a Variable. These are not named or shared.

N3: Dimensions are used to define the array shape of a Variable. All dimensions are named and globally scoped, and may be shared.

NJ22: Dimensions are used to define the array shape of a Variable. These may be globally scoped and shared among Variables, or may be local to a Variable.

OpenDAP: Array dimensions are not shared, except for Grids. Dimensions may be named.

Coordinate Systems

HDF5: not supported

N3: Coordinate variables are supported using shared dimensions. Coordinate variables are one dimensional arrays that give each dimension index a coordinate value.

NJ22: Coordinate variables are the same as in N3. A coordinate axis is a generalization of a coordinate variable; it is not restricted to being one dimensional or to be named the same as a dimension. A coordinate system is a collection of coordinate axes used by a Variable. Coordinate axes are restricted to using a subset of the dimensions of its Variable.

OpenDAP: The Grid datatype explicitly associates an Array and its coordinate maps. Maps are one dimensional.

Variable

HDF5: A Dataset is a multidimensional (rectangular) array of Data Elements. The shape of the array (number of dimensions, size of each dimension) is described by the Dataspace object.

N3: A Variable has a primitive data type, a set of Dimensions that define its array shape, and optionally a set of Attributes.

NJ22: A Variable has a primitive, String, or Structure data type, a set of Dimensions that define its array shape, and optionally a set of Attributes.

OpenDAP: a data source is a collection of Variables. Each variable consists of a name, a type, data values, and a collection of Attributes.

Variable Data Types

HDF5: A Datatype is atomic (primitive) or composite (Array, Enum, Variable Length, Compound). Users can define their own named Datatypes.

N3: primitive types only.

NJ22: primitive, String, or Structure.

OpenDAP: atomic (primitive) and constructor (Array, Structure Grid, Sequence).

Primitives

HDF5: integer, float, double, opaque, reference, enum, bitfield, date. integer types can be 1-8 bytes (?), signed or unsigned. An opaque type is an uninterpreted block of storage. A reference type refers to another Dataset in the same File (?). An enum datatype stores a set of strings and refers to them with a short value.. A bitfield is an integer stored in a user-specified number of bits (packed storage however is currently on byte boundaries ?). date is ISO-8601 Date/time string or Unix date (secs since 1970?).

N3: byte, char, short, int, float and double. unsigned integer types not supported.

NJ22: boolean, byte, char, short, int, long, float, double. unsigned integer support not decided yet.

OpenDAP: byte, int16, uint16 , int32, uint32, float32, float64. byte is considered unsigned.

String

HDF5: A fixed or variable length array of bytes interpreted as ASCII chars.

N3: uses fixed length arrays of ANSI C char, i.e. char[].

NJ22: A variable length array of UTF-8 encoded Unicode characters.

OpenDAP: ANSI C notion of a string: a series of US-ASCII characters each represented in a single byte. Also has a "URL" type which is a String representing a URL.

Structure

HDF5: a Compound Datatype is a collection of member Variables of type primitive, array, vlen, or Compound. The member variables cannot have attributes.

N3: none

NJ22: A Structure is a type of Variable that contains other Variables, analogous to a struct in C. In general, a Structure's data are physically stored close together on disk, so that it is efficient to retrieve all of the data in a Structure at the same time.

OpenDAP: A Structure groups variables so that the collection can be manipulated as a single item. The Structure’s member variables may be of any type, including other constructor types. A Grid is a special kind of Structure, which contains an Array and its coordinate maps.

Sequence/ Vlen

HDF5: A vlen datatype is a one dimensional Variable whose length is not known until you actually read the data. it can have elements of any datatype, including Structure.

N3: none

NJ22: A Sequence is a one dimensional Variable whose length is not known until you actually read the data. All other Variable types know what their array lengths are without having to read the data. You can have sequences of sequences, which is equivalent to ragged arrays. Considering whether to add a specialization that allows a constraint expression, like OpenDAP.

OpenDAP: A Sequence is an ordered collection of zero or more Structures. You can think of the instances in a Sequence as rows in a traditional relational table. Sequences allow selection constraint expressions on its members, like a relational expression, that allow you to select only those sequences that satisfy the constraint. The length of a sequence is not known until it is read.

Object Names

HDF5: Any ASCII except "/" or "."

N3: Alphanumeric ASCII plus _ - . must begin with letter or underscore.

NJ22: same as N3, but disallows ".".

OpenDAP: US-ASCII characters: upper or lower case letters, numbers or from the set _ ! ~ * ’ - " . Any other characters MUST be escaped using %xx, where xx is two hex digit code corresponding to the US-ASCII character.

Issues

-- JohnCaron - 13 Sep 2004