Industrial Data Science
in C# and .NET:
Simple. Fast. Reliable.
 
 

ILNumerics - Technical Computing

Modern High Performance Tools for Technical

Computing and Visualization in Industry and Science

tgt

ILNumerics® HDF5 Interface for .NET

HDF5 is a way of storing structured data in files. From the HDF Group website:

HDF technologies address the problems of how to organize, store, discover, access, analyze, share, and preserve data in the face of enormous growth in size and complexity.

The HDF5 format is very popular among scientific and industrial users. Several interfaces for modern and traditional languages exist. It is being used by many companies and academic institutions around the world. HDF5 is a common solution for data exchange between different systems, platforms and operating systems: Matlab®, NASA Earth Observing System, netCDF4 and many more ...

ILNumerics provides a high level interface to the HDF5 format on the .NET platform. It supports all common features of HDF5, like the efficient creation, retrieval and management of structured data as well as the management of metadata attached to it.

ILNumerics HDF5 API provides access to HDF5 data by means of a high level, convenient, object oriented API, which nicely integrates into the .NET language standards: the interface provides generic .NET Collections and generic, yet typesafe data containers. Data retrieval is smoothly integrated into the ILNumerics memory management, featuring the ILNumerics n-dimensional arrays out of the box.

The ILNumerics.IO.HDF5 namespace wraps a number of objects, which correspond to related objects every HDF5 file can be seen to be made out of. The names of these objects all begin with 'H5...'. In ILNumerics, HDF5 data are handled simply by combining those objects, using properties and methods on them.

Getting Started

Install

ILNumerics.IO.HDF5 is installed with ILNumerics Ultimate VS. In your project you need to add a reference to the 'ILNumerics.IO.HDF5' assembly under "Add Reference -> Extensions". In your code, you must include the following namespace:

Creating and Opening Files

H5File in ILNumerics represents HDF5 files.

It is advisable to use H5File inside an using () { .. } block (C#), in order to automatically flush the data and close the file after use. A new H5File(filename) will automatically open any existing HDF5 file or create a new one.

Creating Objects

Creating HDF5 files is fun! HDF5 files have some similarities with filesystems. Folders (H5Group) exist that may contain other folders or data (H5Dataset). In addition, every object in HDF5 may have a number of attributes attached to it (H5Attribute). Those are useful, in order to enrich persisted objects with additional metadata.

One can use the Add() function on any group in order to add arbitrary H5Objects to this group. Attributes are added to the Attributes collection, which is a property of groups, datasets and datatypes. More details about those HDF5 object types are found in the corresponding sections of this ILNumerics.IO.HDF5 online manual.

HDF5 files are able to store complex structured data from simple objects, like groups, datasets, attributes. They work smoothly in conjunction with C# objects and collection initializers. The following example creates a non-trivial HDF5 file in a single large C# expression:

Object initializers are handy for prototyping HDF5 files. However, HDF5 objects can also be fully manipulated in the old fashioned 'imperative' way.

Accessing Objects

Once written to the file, data must be accessible in order to be useful. ILNumerics.IO.HDF5 provides the following options to query and retrieve objects / data from HDF5 files. The following examples are based on the HDF5 file created above.

Objects are retrieved by specifying an absolute or relative path. Relative paths locate an object relative to the current node, absolute paths start with a slash and locate an object relative to the files root node:

Recursively retrieve a specific object from the subtree by filtering by name, by type or by arbitrary predicates:

Objects are searched for among all the children of the group. When searching directly inside the file, all objects in the file are considered. The search is done recursively in breadth-first, or depth-level order: Direct children are searched first, afterwards the children of the children are considered a.s.o. in order to find an object in a subgroup:

Note, the path to the object does not need to be fully specified, but relative and absolute paths can be used. However, if the path contains any slashes, the search will only be performed amongst the direct children - i.e. no recursive search. 

The "fuzziness" of First<T>() does also apply to the name of the object to retrieve: when searching for a phrase 'ds', all objects with a name containing the phrase will match, ie. the phrase is taken with wildcards like '*ds*'. Be as specific as you need!

Retrieving a collection of objects:

Navigating relative to a group:

Read more in the HDF5 groups section.

Data I/O

Datasets and attributes in HDF5 store n-dimensional rectilinear arrays of elements of the same type. This directly corresponds to ILNumerics Array<T>. Hence, data from/to such HDF5 objects are transferred as ILNumerics arrays. The Get<T>() function is used on datasets and attributes, in order to retrieve data. The Set() function stores ILNumerics array data into HDF5 objects:

If only a part of the data is needed, datasets can utilize partial I/O by using the HDF5 hyperslab feature:

Visit the HDF5 dataset section for more details on datasets and hyperslabs. See the subarray tutorial to learn more about how to create ILNumerics subarrays. 

Storing data is just as easy. One uses the Set() function on datasets and specifies the new data as well as the definition of the modification range. Keep in mind, the range for modification must be given explicitly for all dimensions existing in the stored dataset (Get<T>() is more relaxed here):

You can overwrite ranges of an existing dataset in a similar way, again describing the full range of modification:

Datasets are always created chunked. The chunk size corresponds to the size of the dataset at the time of creation. By default, one can freely shrink / expand datasets afterwards, since the maximum dimensions of a dataset is set to 'unlimited' by ILNumerics.

Read the online manual for H5Dataset.

Attributes

The data stored by HDF5 attributes are expected to be rather small. Therefore, HDF5 does not provide partial I/O for them. However, you can always use the ILNumerics subarray feature on the array returned by H5Attribute.Get<T>(), in order to only retrieve parts of the attribute:

When an attribute is altered, the whole attribute will be rewritten. This potentially changes the size / shape and datatype of the attribute. Attributes are handled in detail in the attribute section.

You can create links as aliases to other objects in the file. Hardlinks always point to an existing object. Softlinks may or may not point to such:

'/hardlink' makes the dataset originating from '/group1/group2/ds2' directly available under the alias '/hardlink'. A shortcut exists for creating such hardlinks:

Softlinks are similar, but they allow one to point to an object, which does not exist in the file:

Visual Studio Integration

We want to make handling HDF5 in .NET a joyfull experience. See how ILNumerics.HDF5 is integrated in Visual Studio and how it facilitates the debugging of applications. The following video is only available in the online version of this manual:

Try this for yourself in order to check out all recent improvements not contained in the video yet!