Tag Archives: Memory Management

Large Object Heap Compaction – on Demand ??

In the 4.5.1 side-by-side update of the .NET framework a new feature has been introduced, which will really remove one annoyance for us: Edit & Continue for 64 bit debugging targets. That is really a nice one! Thanks a million, dear fellows in “the corp”!

Another useful one: One can now investigate the return value of functions during a debug session.

Now, while both features will certainly help to create better applications by helping you to get through your debug session more quickly and conveniently, another feature was introduced, which deserves a more critical look: now, there exist an option to explicitly compact the large object heap (LOH) during garbage collections. MSDN says:

If you assign the property a value of GCLargeObjectHeapCompactionMode.CompactOnce, the LOH is compacted during the next full blocking garbage collection, and the property value is reset to GCLargeObjectHeapCompactionMode.Default.

Hm… They state further:

You can compact the LOH immediately by using code like the following:

GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect(); 

Ok. Now, it looks like there has been quite some demand for ‘a’ solution for a serious problem: LOH fragmentation. This basically happens all the time when large objects are created within your applications and relased and created again and released… you get the point: disadvantageous allocation pattern with ‘large’ objects will almost certainly lead to holes in the heap due to reclaimed objects, which are no longer there, but other objects still resisting in the corresponding chunk, so the chunk is not given back to the memory manager and OutOfMemoryExceptions are thrown rather early …

If all this sounds new and confusing to you – no wonder! This is probably, because you are using ILNumerics :) Its memory management prevents you reliably from having to deal with these issues. How? Heap fragmentation is caused by garbage. And the best way to handle garbage is to prevent from it, right? This is especially true for large objects and the .NET framework. And how would one prevent from garbage? By reusing your plastic bags until they start disintegrating and your eggs get in danger of falling through (and switching to a solid basket afterwards, I guess).

In terms of computers this means: reuse your memory instead of throwing it away! Especially for large objects this puts way too much pressure on the garbage collector and at the end it doesn’t even help, because there is still fragmentation going on on the heap. For ‘reusing’ we must save the memory (i.e. large arrays in our case) somewhere. This directly leads to a pooling strategy: once an ILArray is not used anymore – its storage is kept safe in a pool and used for the next ILArray.

That way, no fragmentation occurs! And just as in real life – keeping the environment clean gives you even more advantages. It helps the caches by presenting recently used memory and it protects the application from having to waste half the execution time in the GC. Luckily, the whole pooling in ILNumerics works completely transparent in the back. There is nothing one needs to do in order to gain all advantages, except following the simple rules of writing ILNumerics functions. ILNumerics keeps track of the lifetime of the arrays, safes their underlying System.Arrays in the ILNumerics memory pool, and finds and returns any suitable array for the next computation from here.

The pool is smart enough to learn what ‘suitable’ means: if no array is available with the exact length as requested, a next larger array will do just as well:

public ILRetArray CreateSymm(int m, int n) {
    using (ILScope.Enter()) {
        ILArray A = rand(m,n); 
        // some very complicated stuff here...
        A = A * A + 2.3; 
        return multiply(A,A.T);
    }
}

// use this function without worrying about your heap!
while (true) {
   dosomethingWithABigMatrix(CreateSymm(1000,2000)); // one can even vary the sizes here!
   // at this point, your heap is clean ! No fragmentation! No GC gen.2 collections ! 
}

Keep in mind, the next time you encounter an unexpected OutOfMemoryException, you can either go out and try to make use of that obscure GCSettings.LargeObjectHeapCompactionMode property, or … simply start using ILNumerics and forget about that problem at least.

ILNumerics Language Features: Limitations for C#, Part II: Compound operators and ILArray

A while ago I blogged about why the CSharp var keyword cannot be used with local ILNumerics arrays (ILArray<T>, ILCell, ILLogical). This post is about the other one of the two main limitations on C# language features in ILNumerics: the use of compound operators in conjunction with ILArray<T>. In the online documentation we state the rule as follows:

The following features of the C# language are not compatible with the memory management of ILNumerics and its use is not supported:

  • The C# var keyword in conjunction with any ILNumerics array types, and
  • Any compound operator, like +=, -=, /=, *= a.s.o. Exactly spoken, these operators are not allowed in conjunction with the indexer on arrays. So A += 1; is allowed. A[0] += 1; is not!

Let’s take a closer look at the second rule. Most developers think of compound operators as being just syntactic sugar for some common expressions:

int i = 1;
i += 2;

… would simply expand to:

int i = 1;
i  = i + 2; 

For such simple types like an integer variable the actual effect will be indistinguishable from that expectation. However, compound operators introduce a lot more than that. Back in his times at Microsoft, Eric Lippert blogged about those subtleties. The article is worth reading for a deep understanding of all side effects. In the following, we will focus on the single fact, which becomes important in conjunction with ILNumerics arrays: when used with a compound operator, i in the example above is only evaluated once! In difference to that, in i = i + 2, i is evaluated twice.

Evaluating an int does not cause any side effects. However, if used on more complex types, the evaluation may does cause side effects. An expression like the following:

ILArray<double> A = 1;
A += 2;

… evaluates to something similiar to this:

ILArray<double> A = 1;
A = (ILArray<double>)(A + 2); 

There is nothing wrong with that! A += 2 will work as expected. Problems arise, if we include indexers on A:

ILArray<double> A = ILMath.rand(1,10);
A[0] += 2;
// this transforms to something similar to the following: 
var receiver = A; 
var index = (ILRetArray<double>)0;
receiver[index] = receiver[index] + 2; 

In order to understand what exactly is going on here, we need to take a look at the definition of indexers on ILArray:

public ILRetArray<ElementType> this[params ILBaseArray[] range] { ... 

The indexer expects a variable length array of ILBaseArray. This gives most flexibility for defining subarrays in ILNumerics. Indexers allow not only scalars of builtin system types as in our example, but arbitrary ILArray and string definitions. In the expression A[0], 0 is implicitly converted to a scalar ILNumerics array before the indexer is invoked. Thus, a temporary array is created as argument. Keep in mind, due to the memory management of ILNumerics, all such implicitly created temporary arrays are immediately disposed off after the first use.

Since both, the indexing expression 0 and the object where the indexer is defined for (i.e.: A) are evaluated only once, we run into a problem: index is needed twice. At first, it is used to acquire the subarray at receiver[index]. The indexer get { ...} function is used for that. Once it returns, all input arguments are disposed – an important foundation of ILNumerics memory efficency! Therefore, if we invoke the index setter function with the same index variable, it will find the array being disposed already – and throws an exception.

It would certainly be possible to circumvent that behavior by converting scalar system types to ILArray instead of ILRetArray:

ILArray A = ...;
A[(ILArray)0] += 2;

However, the much less expressive syntax aside, this would not solve our problem in general either. The reason lies in the flexibility required for the indexer arguments. The user must manually ensure, all arguments in the indexer argument list are of some non-volatile array type. Casting to ILArray<T> might be an option in some situations. However, in general, compound operators require much more attention due to the efficient memory management in ILNumerics. We considered the risk of failing to provide only non-volatile arguments too high. So we decided not to support compound operators at all.

See: General Rules for ILNumerics, Function Rules, Subarrays

Using ILArray as Class Attributes

A lot of people are confused about how to use ILArray as class member variables. The documentation is really sparse on this topic. So let’s get into it!

Take the following naive approach:

class Test {

    ILArray<double> m_a;

    public Test() {
        using (ILScope.Enter()) {
            m_a = ILMath.rand(100, 100);
        }
    }

    public void Do() {
        System.Diagnostics.Debug.WriteLine("m_a:" + m_a.ToString());
    }

}

If we run this:

    Test t = new Test(); 
    t.Do(); 

… we get … an exception :( Why that?

ILNumerics Arrays as Class Attributes

We start with the rules and explain the reasons later.

  1. If an ILNumerics array is used as class member, it must be a local ILNumerics array: ILArray<T>
  2. Initialization of those types must utilize a special function: ILMath.localMember<T>
  3. Assignments to the local variable must utilize the .a property (.Assign() function in VB)
  4. Classes with local array members should implement the IDisposable interface.
  5. UPDATE: it is recommended to mark all ILArray local members as readonly

By applying the rules 1..3, the corrected example displays:

class Test {

    ILArray<double> m_a = ILMath.localMember<double>();

    public Test() {
        using (ILScope.Enter()) {
            m_a.a = ILMath.rand(100,100);
        }
    }

    public void Do() {
        System.Diagnostics.Debug.WriteLine("m_a:" + m_a.ToString()); 
    }

}

This time, we get, as expected:

m_a:<Double> [100,100]
   0,50272    0,21398    0,66289    0,75169    0,64011    0,68948    0,67187    0,32454    0,75637    0,07517    0,70919    0,71990    0,90485    0,79115    0,06920    0,21873    0,10221 ...
   0,73964    0,61959    0,60884    0,59152    0,27218    0,31629    0,97323    0,61203    0,31014    0,72146    0,55119    0,43210    0,13197    0,41965    0,48213    0,39704    0,68682 ...   
   0,41224    0,47684    0,33983    0,16917    0,11035    0,19571    0,28410    0,70209    0,36965    0,84124    0,13361    0,39570    0,56504    0,94230    0,70813    0,24816    0,86502 ...   
   0,85803    0,13391    0,87444    0,77514    0,78207    0,42969    0,16267    0,19860    0,32069    0,41191    0,19634    0,14786    0,13823    0,55875    0,87828    0,98742    0,04404 ...   
   0,70365    0,52921    0,22790    0,34812    0,44606    0,96938    0,05116    0,84701    0,89024    0,73485    0,67458    0,26132    0,73829    0,10154    0,26001    0,60780    0,01866 ...
...

If you came to this post while looking for a short solution to an actual problem, you may stop reading here. The scheme will work out fine, if the rules above are blindly followed. However, for the interested user, we’ll dive into the dirty details next.

Some unimportant Details

Now, let’s inspect the reasons behind. They are somehow complex and most users can silently ignore them. But here they are:

The first rule is easy. Why should one use anything else than a local array? So lets step to rule two:

  • Initialization of those types must utilize a special function: ILMath.localMember<T>

A fundamental mechanism of the ILNumerics memory management is related to the associated livetime of certain array types. All functions return temporary arrays (ILRetArray<T>) which do only live for exactly one use. After the first use, they get disposed off automatically. In order to make use of such arrays multiple times, one needs to assign them to a local variable. This is the place, where they get converted and the underlying storage is taken for the local, persistent array variable.

At the same time, we need to make sure, the array is released after the current ILNumerics scope (using (ILScope.Enter())) { … }) was left. Thereforem the conversion to a local array is used. During the conversion, since we know, there is going to be a new array out there, we track the new array for later disposal in the current scope.

If the scope is left, it does exactly what it promises: it disposes off all arrays created since its creation. Now, local array members require a different behavior. They commonly live for the livetime of the class – not of the current ILNumerics scope. In order to prevent the local array to get cleaned up after the scope in the constructor body was left, we need something else.

The ILMath.localMember() function is the only exception to the rule. It is the only function, which does not return a temporary array, but a local array. In fact, the function is more than simple. All it does, is to create a new ILArray<T> and return that. Since bothe types of both sides of the assignment match, no conversion is necessary and the new array is not registered in the current scope, hence it is not disposed off – just what we need!

What, if we have to assign the return value from any function to the local array? Here, the next rule jumps in:

  • Assignments to the local variable must utilize the .a property (.Assign() function in VB)

Assigning to a local array directly would activate the disposal mechanism described above. Hence, in order to prevent this for a longer living class attribute, one needs to assign to the variable via the .a property. In Visual Basic, the .Assign() function does the same. This will prevent the array from getting registered into the scope.

Example ILNumerics Array Utilization Class

Now, that we archieved to prevent our local array attribute from getting disposed off magically, we – for the sake of completeness – should make sure, it gets disposed somewhere. The recommended way of disposing off things in .NET is … the IDisposal interface. In fact, for most scenarios, IDisposal is not necessary. The array would freed, once the application is shut down. But we recommend implementing IDisposable, since it makes a lot of things more consistent and error safe. However, we provide the IDisposable interface for convenience reasons only – we do not rely on it like we would for the disposal of unmanaged ressources. Therefore, a simplified version is sufficient here and we can omit the finalizer method for the class.

Here comes the full test class example, having all rules implemented:

class Test : IDisposable {
    // declare local array attribute as ILArray<T>, 
    // initialize with ILMath.localMember<T>()! 
    readonly ILArray<double> m_a = ILMath.localMember<double>();

    public Test() {
        using (ILScope.Enter()) {
            // assign via .a property only!
            m_a.a = ILMath.rand(100,100);
        }
    }

    public void Do() {
        // assign via .a property only! 
        m_a.a = m_a + 2; 

        System.Diagnostics.Debug.WriteLine("m_a:" + m_a.ToString()); 
    }


    #region IDisposable Members
    // implement IDisposable for the class for transparent 
    // clean up by the user of the class. This is for con-
    // venience only. No harm is done by ommitting the 
    // call to Dispose(). 
    public void Dispose() {
        // simplified disposal pattern: we allow 
        // calling dispose multiple times or not at all.
        if (!ILMath.isnull(m_a)) {
            m_a.Dispose(); 
        }
    }

    #endregion
}

For the user of your class, this brings one big advantage: she can – without knowing the details – clean up its storage easily.

 
    using (Test t = new Test()) {
        t.Do();
    }

@UPDATE: by declaring your ILArray members as readonly one gains the convenience that the compiler will prevent you from accidentally assigning to the member somewhere in the code. The other rules must still be fullfilled. But by only using readonly ILArray<T> the rest is almost automatically.

ILArray, Properties and Lazy Initialization

@UPDATE2: Another common usage pattern for local class attributes is to delay the initialization to the first use. Let’s say, an attribute requires costly computations but is not needed always. One would usually create a property and compute the attribute value only in the get accessor:

class Class {
        
    // attribute, initialization is done in the property get accessor
    Tuple<int> m_a;

    public Tuple<int> A {
        get {
            if (m_a == null) {
                m_a = Tuple.Create(1);  // your costly initialization here
            }
            return m_a; 
        }
        set { m_a = value }
    }
}

How does this scheme go along with ILNumerics’ ILArray? Pretty well:

class Class1 : ILMath, IDisposable {

    readonly ILArray<double> m_a = localMember<double>();

    public ILRetArray<double> A {
        get {
            if (isempty(m_a)) {
                m_a.a = rand(1000, 2000); // your costly initialization here
            }
            return m_a; // this will only return a lazy copy to the caller!
        }
        set {
            m_a.a = value; 
        }
    }

    public void Dispose() {
        // ... common dispose implementation
    }
}

Instead of checking for null in the get accessor, we simply check for an empty array. Alternatively you may initialize the attribute with some marking value in the constructor. NaN, MinValue, 0 might be good candidates.

Putting on a Good Show with HDF5, ILNumerics, and PowerShell

It is certainly nice to have the option to do all kinds of numeric stuff right in your .NET application layer – without the need for interfacing any unmanaged module. But for some tasks, this still seems overkill.

Lets say, you went to that conference and want to give your new friends some insight into your brand new simulation results. The PC in the internet cafe enables you to fetch the data from your NAT storage at home. But will you be able to do anything with it on that plain Windows PC?

Or you want to localize a certain test data set but cannot remember its rather cryptic name. Or you might want to manage the latest measurement results from todays atmospheric observation satellite scans. The data are huge but often require some sort of preprocessing. There should be some easy way to filter them by the meta data within the files, right?

Other than getting the data from some application layer, we now want to interface plain old file objects. Of course, you store your data in HDF5 format, right? You do so, because HDF5 is portable, very efficient, flexible and you are in good company.

Let’s see. We have a fresh Windows PC and we know every Windows installation nowadays comes with Powershell. Powershell itself is based on the .NET framework and hence efficiently handles any .NET assembly. It should be easy to use ILNumerics with Powershell! All we still need is some way to access the HDF5 files. ILNumerics, natively is able to read and write Matlab mat files up to version 6. It currently lags on native HDF5 support.

Luckily, the HDF Group provides a large collection of high quality tools for HDF support. Among them you’ll find a .NET wrapper and … a brand new Powershell module: PSH5X! Together with Gerd Heber, the leading inventor of PSH5X, we did a feasibility study with the goal to investigate the options of utilizing HDF5 and ILNumerics together in Powershell. It can be downloaded here. We were quite impressed by the options this brings.

This blog post will describe the necessary steps to setup Powershell for ILNumerics and HDF5.

Getting Started

Basically, the installation process for any Powershell module consists of

  1. Getting the module files and its dependencies from somewhere,
  2. Deploying the module files into a special folder on your machine, and
  3. Importing the module in your session.

The PSH5X homepage gives all information on how to get ready using the HDF5 Powershell module. Just download the package and follow the three steps on the page. At the end, HDF5 signals you a successful installation by displaying its version numbers.

Since ILNumerics depends on several other modules, we provide a small bootstrapper script. Just open up your favorite Powershell IDE (PowerShell_ISE.exe comes with any recent Windows) and copy/paste the following line:

(new-object Net.WebClient).DownloadString('http://ilnumerics.net/media/InstallILNumericsPSM.ps1') | iex

If you are curious, what this does – just ommit the trailing | iex and the script is not executed but displayed for your inspection.

The installer will ask for the installation folder (global under System32/ or local in your user profile), fetches the latest ILNumerics package for the current platform from the official nuget repository and install it into the selected module folder. In addition it loads the TypeAccelerator Powershell module and installs it into the same module directory. Note, the accelerators have been slightly modified in order to make them work with Powershell 3 and hence are fetched from our ILNumerics server. However, credits fully belong to poshoholic for his great work.

Note, the installation has to be done only once. Afterwards, on the next Powershell session, simply re-import needed modules by typing – lets say:

PS> Import-Module ILNumerics 

Go!

If everything was setup correctly, we can now use the full spectrum of the involved modules:

PS> [ilmath]::rand(4,5).ToString()
<Double> [5,4]
   0,72918    0,87547    0,43167    0,94942
   0,58024    0,75562    0,96125    0,83148
   0,22454    0,20583    0,82285    0,83144
   0,13300    0,40047    0,58829    0,87012
   0,50751    0,05496    0,02814    0,48764 

Nice. But what about the MKL? Are the correct binaries really installed as well?

PS> [ilf64] $A = [ilmath]::rand(1000,1000)
PS> Measure-Command { [ilf64]$C = [ilmath]::rank($A) }
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 920
Ticks             : 9202311
TotalDays         : 1,06508229166667E-05
TotalHours        : 0,00025561975
TotalMinutes      : 0,015337185
TotalSeconds      : 0,9202311
TotalMilliseconds : 920,2311

PS> $C.ToString()
1000

We have almost all options from C#:

PS> [ilf64] $part = $A['10:15;993:end']
PS> $part.ToString()
<Double> [11,7]
   0,08522    0,87217    0,59997    0,57363    0,22956    0,02006    0,02359
   0,33479    0,49003    0,65269    0,97772    0,28322    0,69505    0,70372
   0,30072    0,68705    0,47112    0,68627    0,65030    0,40454    0,63026
   0,15639    0,30391    0,22992    0,69310    0,65716    0,51797    0,68110
   0,72854    0,60188    0,50740    0,74499    0,13459    0,88481    0,12445
   0,80525    0,60180    0,69256    0,74825    0,64388    0,16792    0,45266 

Lets sort the first row of $part, keeping track of original positions:

PS> [ilf64] $indices = 0.0
PS> [ilf64] $sorted = [ilmath]::sort($part['0,1;:'],$indices,0,$false)
PS> $sorted.ToString()
<Double> [2,7]
   0,02006    0,02359    0,08522    0,22956    0,57363    0,59997    0,87217
   0,28322    0,33479    0,49003    0,65269    0,69505    0,70372    0,97772
PS> $indices.ToString()
<Double> [2,7]
         5          6          0          4          3          2          1
         4          0          1          2          5          6          3 

This is all interactive. Of course, we can write complete functions and even complex algorithms that way.
One of the best things: Even in Powershell ILNumerics saves your memory and meets all expectations regarding execution speed. Powershell allows you to consequently use ILNumerics’ typing and scoping rules.

In our feasibility study with Gerd Heber, we show how easy it gets to access an HDF5 file, to convert its data to ILNumerics arrays (implicitly), filter and manipulate a little and even create a full interactive 3D surface graph from it. We demonstrate how to use the type accelerators and to mimic the using statement for artificial scoping. Take a look and let us know, what you think!