Industrial Data Science
in C# and .NET:
Simple. Fast. Reliable.
 
 

ILNumerics - Technical Computing

Modern High Performance Tools for Technical

Computing and Visualization in Industry and Science

ILNumerics Memory Management

This page gives a detailed insight into the automatic memory management for computational arrays in ILNumerics. The common functions rules are explained. Guidelines for high performance computing situations are provided. 

The big picture

Memory management is one of the key features of ILNumerics. Automatic memory management by garbage collection is one of the key features of the .NET framework - the host the ILNumerics DSL is based upon. Garbage collection is great and actually was designed for common business objects. The memory requirements of "Person" and "Book" records, however, are very different from those of numerical arrays, storing thousands or millions of common elements as rectilinear, multidimensional arrays. In .NET those objects are stored on a specialized area of the managed heap: the large object heap (LOH).

When the garbage collector goes out to clean objects from the LOH it does not move collection survivors in order to close gaps on the LOH (not compacting). Furthermore, LOH collections are always full collections: the most expensive type of collection, including long lived objects in generation 2.

The great convenience of the GC aside - when it comes to large computational arrays a naive memory management approach purely based on garbage collection exposes the following disadvantages: 

  1. Performance: frequent gen2 (full) collections of the GC are expensive, block other threads and the CPU. 
  2. Fragmentation: LOH collections are not compacting. Long running algorithms and repeating allocation patterns often create fragmented heaps, reducing the usable amount of memory for computations. 
  3. CLR limitations: certain limits imposed by the CLR / .NET limit the size of a managed object to ~2GB - even on 64 bit. Ways exist to get slightly larger objects to work but they all involve special configurations and quirky APIs. 

The effects of these disadvantages can actually become quite significant. In a naive attempt to rely on the GC for demanding computations one can easily observe up to 50 % 'time spent in GC' (performance counter results).  Heap fragmentation is a very common issue requiring developers to cache large objects in pools - a fantastic source of hard to find bugs and clumsy codes. And when working with double precision data one hits the 2 GB wall with a 3 dimensional array of size 1024 x 1024 x 256 already! 

ILNumerics Memory Model 

In version 5 ILNumerics introduced a new memory model for computational arrays which overcomes the above disadvantages. ILNumerics arrays are ...

  • Allocated from the unmanaged process heap,
  • Deterministically collected in efficient memory pools, 
  • Transparently reused for subsequent computations, 
  • Released from CLR size limitations, 
  • Robustly protected against memory leaks. 

 

ILnumerics Function Rule Details

The common ILNumerics function rules are easy to remember and to follow. An overview was already provided here. But what is the background of these rules? 

[TODO]...

Array Structures

[TODO]  The class model (?), 

Reference Counting

[TODO]

32/64 bit targeting

[TODO]

High Performance Rules 

The common rules are sufficient in the vast majority of cases. For the rest we give the following additional hints. They help to manage your memory efficiently even in long running loops and when handling very large arrays. It must be stressed, though, that using the common rules alone will not produce any error! The following guidelines are solely useful to improve your application performance in demanding situations. 

Consider the following example: 

 

Here, we create a large array of size <Double>[10, 10, 10000]. Slices over the first and second dimension are iteratively replaced with their cholesky decomposition. In the following we will not look at the mathematical side here. Instead, we look at what is going on in terms of memory management. You can copy the code into a simple console application and should be able to reproduce and follow our steps.