Fun with HDF5, ILNumerics and Excel

It is amazing how many complex business processes in major industries today are supported by a tool that shines by its simplicity: Microsoft Excel. ‘Recently’ (with Visual Studio 2010) Microsoft managed to polish the development tools for all Office applications significantly. The whole Office product line is now ready to serve as a convenient, flexible base framework for stunning custom business logic, custom computations and visualizations – with just a little help of tools like ILNumerics.

ExcelWorkbook1In this blog post I am going to show how easy it is to extend the common functionality of Excel. We will enable an Excel Workbook to load arbitrary HDF5 data files, inspect the content of such files and show the data as interactive 2D or 3D plots. HDF5 is an industry standard for the structured storage of technical data and is maintained by the HDF Group. With version 4.0 ILNumerics supports the HDF5 file format with a very convenient, object oriented API. Visualizations have always been a popular feature since the early days of ILNumerics. And Excel can be another convenient GUI tool to marry both.

Prerequisites

Customizations like the one we are going to show are done in Visual Studio and utilize Visual Studio Tools for Office (VSTO). If you don’t own a copy of Visual Studio, you may find it useful that Microsoft gives away their flagship for free under certain conditions. Another prerequisite we will need: Office 2010 or 2013. I use Office 2010 on my computer. For some reason, Visual Studio 2013 did not allow me to create a new Workbook project for the 2010 version of Office and rather requested Office 2013. Therefore, I used Visual Studio 2012 instead. Just make sure to use the version of Office which is supported by your Visual Studio installation.

The last ingredient needed is ILNumerics. Download the free trial – it contains all features and tools you may possibly want and makes them available systemwide.

Setup

Let’s start with a fresh new Workbook project in Visual Studio:

NewProjectWe take all defaults here. This will create a new Visual Studio project with three worksheets, ready for your customizations. Double click on Sheet1.cs to open the designer view.  In the Visual Studio Toolbox find the ILNumerics section and drag a new instance of ILPanel onto your sheet:

Workbook1_VisualStudioDesignerThis will add the reference of ILNumerics to your project and places the ILPanel control on your sheet. You may want to resize and reposition the control. It will be available from now on as ‘iLNumerics_Drawing_ILPanel1′ in your code-behind classes. Feel free to adjust the name – for this demonstration we will leave it as it is.

 

Loading HDF5 Files in Excel

Excel does not support HDF5 file imports directly (yet?). Luckily, ILNumerics bridges the gap very efficiently and conveniently. First, we add a reference to the HDF5 assembly which comes with ILNumerics. In the last step the reference to ILNumerics was added automagically. For HDF5 we have to do this manually. Right click on the References tab in the solution explorer and chose: Add Reference. Now search for “HDF5″ on your system, select the item found and hit “OK”:

AddHDF5ReferenceIf no item was found, make sure you have installed the latest ILNumerics package using our installer and that you selected the Assemblies tab in the Reference Manager window.

Once we have the HDF5 assembly available, we can start coding. The idea is, that the user should be able to load an HDF5 file from disk, inspect the datasets contained and get the option to load and/or visualize their data. So, let’s add some more controls from the toolbox to the workbook sheet: OpenFileDialog and a button to trigger the opening. Drag the button (Common Controls in the Toolbox) and the OpenFileDialog (Dialogs tab in the Toolbox) to the designer surface:

DragButtonOFDDesigner
Now, rename the new button and double click it to open the auto-generated event handler method. This is where we are going to implement the code to open and inspect the HDF5 file:

private void button1_Click(object sender, EventArgs e) {
    var fod = new OpenFileDialog();
    if (fod.ShowDialog() == DialogResult.OK) {

        var filename = fod.FileName;
        // access the HDF5 file for reading
        using (var file = new H5File(filename, accessMode: FileAccess.Read)) {
            int row = 4; // start listing at row 4
            // list all datasets in the file
            foreach (var ds in file.Find<H5Dataset>()) {
                Cells[row,   3].Value = ds.Path;
                Cells[row++, 4].Value = ds.Size.ToString();
            }
            while (row < 100) {
                Cells[row, 3].Value = "";
                Cells[row++, 4].Value = "";
            }
            // display filename
            Cells[2, 4].Value = filename;
        }
    }
}

First, we ask for the filename to open. If the user provided a valid filename we open the file for reading. HDF5 files in ILNumerics are used in ‘using’ blocks. No matter how you leave the block, ILNumerics ensures that the file is not left open. Read here for more details.
Once we have the HDF5 file we start iterating over its datasets. C# foreach constructs make that really easy. Other languages have similar constructs. Inside the foreach loop we simply write out the path of the current dataset and its size to columns of the worksheet.
The while loop afterwards is only needed to clear entries potentially left over from earlier loadings. This expects no more than 100 datasets in a file. In a production code, you will do better…
Finally, the name of the HDF5 file is written into cell[2,4] for user information.

If we now run the workbook (hit F5) and click on our “Load HDF5 Dataset” button a file dialog opens up. Once we select an existing HDF5 file from our disk the file’s datasets are listed on the worksheet:
FirstRun

Loading HDF5 Dataset Content

Next, register a new event handler for the double click event in the worksheet. The worksheet template offers a good place to do so: the Sheet1_Startup event handler is auto generated by Visual Studio. Add the following line in order to allow to react to double click events on the sheet:

this.BeforeDoubleClick += Sheet1_BeforeDoubleClick;

The implementation of the Sheet1_BeforeDoubleClick method does all the work:

void Sheet1_BeforeDoubleClick(Excel.Range Target, ref bool Cancel) {
    // only take cells we are interested in
    if (Target.Value == null || Cells[2, 4].Value == null) return;
    // grab the hdf5 filename from the cell
    var curFilename = Cells[2, 4].Value.ToString();
    // check if this points to an existing file
    if (File.Exists(curFilename)) {
        // grab the dataset name (if the user clicked on it)
        var dsName = ((object)Target.Value).ToString();
        // reasonable?
        if (Target.Count == 1 && !String.IsNullOrEmpty(dsName)) {
            // open the file
            using (var file = new H5File(curFilename, accessMode: FileAccess.Read)) {
                // find the dataset in the file, we provide the full abs. path so we
                // are sure that there is only one such dataset
                var ds = file.First<H5Dataset>(dsName);
                if (ds != null) {
                    // add a new sheet with the name of the dataset
                    var sheet = (Excel.Worksheet)Globals.ThisWorkbook.Sheets.Add();
                    sheet.Name = checkName(dsName);
                    // ... and make it active
                    Globals.ThisWorkbook.Sheets[sheet.Name].Activate();
                    // load data using our extension method (see text)
                    sheet.Set(ds.Get<double>());
                } else {
                    // user has clicked on the size column -> plot the data
                    var size = ParseSize(dsName);
                    if (size != null && Target.Previous.Value != null) {
                        dsName = ((object)Target.Previous.Value).ToString();
                        // read data and render into panel
                        renderDS(file, dsName);
                    }
                }
            }
        }
    }
}

This is all straight forward: we do some very simple error checking here, just to make sure we only react to clicks on interesting columns. In your production code you will do much better error checking! However, here we decide if the user has clicked on a cell with a valid dataset name. If so, the file is opened (fetching the name from the HDF5 filename cell written to earlier), the dataset is located and its content loaded.
The content is written to a new workbook sheet. Attention must be drawn to the naming of the sheet. If a sheet with a similar name exists already, Excel will throw an exception. One easy solution is to add the timestamp to the name, which is left as an exercise. Here, we only do very simple name checking to make sure, no invalid characters enter the name of the sheeet:

string checkName(string name) {
    var ret = name.Replace('/','_');
    if (ret.Length > 31) {
        ret = ret.Substring(-31);
    }
    return ret;
}

The method to actually load the data from an ILArray to the new sheet is found in the following extension method. This is one rather efficient attempt to load large data:

public static void Set(this Worksheet worksheet, ILInArray<double> A, int fromColumn = 1, int fromRow = 1) {
    using (ILScope.Enter(A)) {

        var luCell = worksheet.Cells[fromRow, fromColumn];
        var rbCell = worksheet.Cells[fromRow + A.S[0] - 1, fromColumn + A.S[1] - 1];
        Range range = worksheet.Range[luCell, rbCell];
        range.Value = A.T.ToSystemMatrix();
    }
}

private static System.Array ToSystemMatrix<T>(this ILDenseArray<T> A) {
    using (ILScope.Enter(A)) {
        // some error checking (to be improved...)
        if (object.Equals(A, null)) throw new ArgumentException("A may not be null");
        if (!A.IsMatrix) throw new ArgumentException("Matrix expected");

        // create return array
        System.Array ret = Array.CreateInstance(typeof(T), A.S.ToIntArray().Reverse().ToArray());
        // fetch underlying system array
        T[] workArr = A.GetArrayForRead();
        // copy memory block
        Buffer.BlockCopy(workArr, 0, ret, 0, Marshal.SizeOf(typeof(T)) * A.S.NumberOfElements);
        return ret;
    }
}

Set() creates the range in the sheet to load the data into. The size is computed by the size of the incoming ILArray<T>. In order to load the data, we do not want to iterate over each
individual cell for performance reasons. One option is to set the Value property of the range to a two dimensional System.Array. ToSystemArray() does exactly that conversion. However, you have to
be careful not to get transposed data unexpectedly. The reason is that .NET multidimensional arrays are stored in row major order. ILNumerics on the other hand stores arrays in the same order as Matlab, FORTRAN and other technical tools do. Hence, we need to transpose our data before we assign them to the range. Read more details here.

Now, when we run the application and load a HDF5 file, we can double click on the cell holding a dataset name and have Excel load the dataset contents into a new worksheet – fast. This can be easily adopted by defining ranges (hyperslabs) and only load partial datasets. Also you can adopt the method described here for writing worksheet contents to HDF5 datasets.

Visualizing HDF5 Dataset Contents

Now let’s add another nice feature to our workbook: instead of simply loading the data from a dataset to the worksheet, we add the option of creating interactive, fully configurable and fast visualizations and plots of the data. We’ll use the predefined plot classes of ILNumerics Visualization Engine here.

Back to the double click event handler created earlier, we left out the path which is executed once the user clicked on the size displayed next to each dataset. What happens here is also straightforward.

First we parse the size to see, if it gives something reasonable (again, you’ll add better error checking for a production release). If so, we give the HDF5 file together with the dataset name to the renderDS() method which does the rendering:

private void renderDS(H5File file, string dsName) {
    var ds = file.First<H5Dataset>(dsName);
    if (ds != null) {
        using (ILScope.Enter()) {
            ILArray<float> A = ds.Get<float>();
            if (A.IsVector) {
                iLNumerics_Drawing_ILPanel1.Scene = new ILScene() {
                    new ILPlotCube(twoDMode: true) {
                        new ILLinePlot(A, markerStyle: MarkerStyle.Diamond, lineWidth: 2)
                    }
                };
            } else if (A.IsMatrix) {
                iLNumerics_Drawing_ILPanel1.Scene = new ILScene() {
                    new ILPlotCube(twoDMode: false) {
                        new ILSurface(A, colormap: Colormaps.Hot) {
                            UseLighting = true
                        }
                    }
                };
            }
            iLNumerics_Drawing_ILPanel1.Refresh();
        }
    }
}

This code does not need commenting. It fetches the dataset and loads its content into an ILArray<float>. A new scene replaces the existing one in the worksheet ILPanel. The new scene contains a plot cube and a line plot or a surface plot. Which one is created depends on the shape of the data. Vector sized data create a lineplot, matrices are rendered as a surface plot. In order to have the new scene show up, we must trigger a refresh on the panel.
WorkbooksLinesNow, run the workbook, load a HDF5 file having some vector and/or matrix sized datasets, select a dataset by double clicking on its size cell. The plot is created according to the data. Like all visualizations in ILNumerics we can interact with the data: rotation/ zoom /pan is done with the left/right mouse buttons. And of course, you are free to apply any of the very flexible configuration options in order to customize your plots further. For line plots this includes: markers, dash styles, tex labeling, advanced coloring, logarithmic axes and much more

Deploying ILNumerics Excel Workbooks

Once the workbook is ready for distribution, we build a Release version from Visual Studio and from now on – do not need Visual Studio anymore! In order to make any machine ready for handling HDF5 data and create Excel worksheets with nice visualizations on it, one only needs to run the ILNumerics installer and transfer the workbook to that machine. It will run on any platform, regardsless if 32 or 64 bit!

You can find the full example for downloading on our examples page.