ILNumerics Ultimate VS

H5StringAttribute Constructor

ILNumerics Ultimate VS Documentation
ILNumerics - Technical Application Development
Create a new attribute, storing an array of strings.

[ILNumerics HDF5 Module]

Namespace:  ILNumerics.IO.HDF5
Assembly:  ILNumerics.IO.HDF5 (in ILNumerics.IO.HDF5.dll) Version: 5.5.0.0 (5.5.7503.3146)
Syntax

public H5StringAttribute(
	string name,
	InArray<string> data,
	StringEncoding encoding = StringEncoding.UTF8,
	Nullable<int> baseType = null,
	int length = -1,
	Nullable<StringPadding> padding = null
)

Parameters

name
Type: SystemString
The name of the new attribute.
data
Type: ILNumericsInArrayString
String array to be stored in the new attribute.
encoding (Optional)
Type: ILNumerics.IO.HDF5StringEncoding
[Optional] The encoding used to store the strings in the attribute elements. Default: UTF8.
baseType (Optional)
Type: SystemNullableInt32
[Optional] The base type of the strings as stored in the HDF5 file. Default: null (derived from vlen string class).
length (Optional)
Type: SystemInt32
[Optional] Determine if the string elements are of fixed or variable length. Default: -1 (variable length).
padding (Optional)
Type: SystemNullableStringPadding
[Optional] The padding used for fixed length strings (length in (-2, 0, 1, ...)). Default: null (padding depends on the setting of baseType).
Remarks

By default new string attributes in ILNumerics.IO.HDF5 are created as variable length strings, similar to C type strings with UTF8 encoding. This type of strings most closely corresponds to the common String strings in .NET.

If the string elements must be stored as fixed length strings or if ASCII encoding must be used instead or if one needs more control about the fixed length parameters (length, padding) the optional parameters encoding, baseType, length, and padding can be used to modify the default settings. A typical situation where this may become necessary is the need to exchange data between existing, unmanaged applications / other frameworks / other APIs.

name is used to identify the new attribute in the collection of attributes of the hosting object. The name can be any string, including special characters from the whole unicode character space. When any character in name is not compatible with the common ASCII character set make sure that HDF5DefaultStringEncoding is set to its default value of UTF8.

data is a n-dimensional ILNumerics array of strings of arbitrary size. Note that the size of attributes in HDF5 can not get changed after creation. The size of data therefore determines the size of the attribute on file. However, one can use the Set(InArrayString, NullableStringEncoding, NullableInt32, NullableInt32, NullableStringPadding) function on existing attributes in order to change any configuration value of the attribute. Under the hood ILNumerics will remove the attribute and recreate it with new settings in case that any incompatible parameter setting is detected. Since this may come with potential performance issues one should carefully select the initial size of the attribute's array value and size to prevent from such operations later.

The encoding parameter is used to specify the content encoding of the attributes string elements. The default value of UTF8 allows to store any unicode character and to make a full roundtrip by reading it back into ILNumerics string arrays.

If for some (rare) reasons one needs to store the elements as ASCII encoded bytes the encoding parameter can be set to ASCII.

Let's stress the fact that HDF5 itself uses ASCII as the default decoding! However, since ASCII is a subset of UTF8 any ASCII string stays exactly the same when it becomes UTF8 encoded. Hence, no compatibility issues are expected when using the default encoding (UTF8) in ILNumerics.IO.HDF5 with ASCII strings: in the file both versions will be byte-compatible. Using any ASCII-encoding-only-aware program on such strings will give the same result as if the string was stored with ASCII encoding explicitly.

baseType can optionally be used to specify the HDF5 base type for storing the files on disk. Two common string base types are popular here: C_S1 (default) and FORTRAN_S1. While the former most closely corresponds to common variable length, null-terminated C strings, the latter mimics common strings stored from FORTRAN (fixed length, space padded). Any other predefined datatype may be used as the source datatype here. Make sure that the base type corresponds to a class of STRING.

The length parameter controls the length of individual elements as they are stored in the HDF5 file:

  • - elements are stored as fixed length strings in the file. I.e.: the number of bytes used for all elements is the same. The actual number is determined automatically by ILNumerics according to the string elements provided in data and the settings of encoding as well as padding. Note that this parameter controls the number of bytes after encoding the chars from data and after applying any potential 0-padding / termination to the encoded bytes. Therefore the resulting number may be larger than expected. Especially for UTF8 encoding and when non-ASCII characters are used it will certainly be larger than the number of characters in data's elements. This value is the recommended value for storing fixed length strings.
  • - elements are stored as variable length strings. I.e.: the number of bytes used for storing individual strings from data may differ. This is the default setting in ILNumerics. It uses the smallest storage when storing strings of various lengths.
  • >= 0
    - a positive number specifies a fixed number of bytes used to store the strings from data. This corresponds to the fixed length strings created by , except that you are responsible to figure out the optimal number of bytes
    len
    to use. If
    len
    is too large, you will waste storage in the resulting file. If
    len
    is too small to fit all (encoded, null-terminated) byte array converted strings - truncation will happen on the strings! It is recommended to let ILNumerics figure out the optimal setting for the number of bytes required to store all characters of the string without loosing any information.

The padding parameter controls what will be stored at the end of such strings which are smaller than the fixed length given by length. Commonly this will be set to NULLTERM (default). This will make sure to terminate any string element with a 0-byte value. Other settings allow to create strings in the same manner as, let's say FORTRAN would do: SPACEPAD in combination with a fixed length creates all space padded strings of the same length - without any 0-termination. This way it is possible to create any string attribute configuration and to ensure compatibility with external programs with limited capabilities.

For element base types of C_S1 (default) the default padding is NULLTERM. For element base types of FORTRAN_S1 padding will be set to SPACEPAD by default. The default padding for other element types is undefined.

The padding parameter is not used for variable length strings (length = -1).

This constructor does only construct a proxy for lazy attribute creation. The attribute is only created in the HDF5 file at the time the proxy is added to an existing H5Object (lazy creation).

[ILNumerics HDF5 Module]

See Also

Reference

Other Resources