Optimizing memory usage

In common scenarios users of ILNumerics.Net will not have to care about special memory considerations. They just write their algorithms and let the garbage collector clean up after them. However, sometimes applications handling large data can consume a too large part of the systems memory. This may result in decreased performance for both: your application and other applications concurrently running on the system. In this section we therefore give some hints for optimizing memory usage in such situations.

Clean up manually

The garbage collector does a good job cleaning unused objects without having the user to intervene. Unfortunately it is not known, when the GC runs, nor can the user be sure to have the memory of unused objects available right after the next GC run. The reason is, that the GC just triggers the proccess of re-registering the memory in the pool. The registration itself will be done by another thread, calling the finalization code. In order to make the whole process more deterministically, one may clean up the objects manually - similar to the destructor in traditional unmanaged languages. All ILArray objects therefore own a member Dispose(). Calling this member
  • should be done, if no data of this array will be used anymore,
  • detaches the array from all referencing arrays which may still exist,
  • transfers the memory for this object into the ILNumerics.Net memory pool. The memory is available to other requests immediately.
Dispose() must be the last function called for an object existing. Call this function best, right before the object runs out of scope! Any methods called on the object after calling Dispose() will lead to undefined results!

Disposing example

Lets get back to the example in article memory management for ILNumerics.Net. We redisplay the example here:

ILArray<double> A = (ILMath.zeros(200,100,10) + 2.0) / 4.0; 

Remember, the expression above does create three ILArrays, but only one of them - the one returned from ILMath.divide() is still available afterwards. The memory of the others is reclaimed or registered in the memory pool after the next garbage collector run. The developer obviously can not influence the moment, when this memory will again be available. A optimal memory solution would consume only two working copies. Both arrays could then alternatingly get reused.

How to get around the problem stated there? The solution is to reformulate the expression in a way to prevent for loose handles:

ILArray<double> A = ILMath.zeros(200,100,10);
ILArray<double> dummy = A + 2.0;
A.Dispose();
A = dummy / 4.0;
dummy.Dispose();

We use a temporary variable to store and keep the handles for each intermediate result. By use of this variable we can access the Dispose() function for the objects after its data are not needed anymore. Calling Dispose() will prepare the array for clean up by unregistering any references may attached to it and place the underlying System.Array into the ILNumerics.Net memory pool. The good part is, that this is done immediately. The object will be available right after the function returns. The code above will create exactly 2 times the memory of the result - in difference to the one-line-example stated above, which will create three copies of that size.

Semi-automated cleaning up

The more efficient memory usage comes with the drawback of less intuitive code. One will have to find the right balance between those two sides of the medal. It should be stated, that one does not have to clean up her arrays in this very carefull way. This will however only be an issue for very large problems. She just as well can leave the statements in the short form and accept the larger memory needed.

There is yet another way of handling this situation: The garbage collector offers an interface for triggering a collection manually. The following code demonstrates the trade off and may serve as a more simple still intuitive solution:

ILArray<double> A = (ILMath.zeros(200,100,10) + 2.0) / 4.0;
ILNumerics.Misc.ILMemoryPool.Pool.Collect();

This statement will again consume three times the memory of the result. But right after the evaluation has finished all memory not needed will be set free. This time two cleaned arrays will enter the memory pool and are available for the next expressions. For most situations it will even be suitable to narrow the garbage collection to the lowest generation by calling the overloaded version:

ILNumerics.ILMemoryPool.Pool.Collect(0);

Limiting the collection to generation 0 objects leads to faster execution, since less objects - namely the youngest objects - are to bo collected and finalized. However, this method may introduces a performance hit if called often. Its use therefore should be carefully balanced.

Directly accessing the memory pool

The use of the ILNumerics.Net memory pool is not limited to the internal creation of ILArray objects. Sometimes it will be of use, to prefill the internal data for ILArrays manually before wrapping the ILArray object around, like shown in the next example:
ILBaseArray A; 
double [] sysArray = new double[10000]; 
// call your function to fill the data - the function expects a system array
myCustomDataFillerFunction(sysArray); 
// after filling data - create the array
A = new ILAray<double>(sysArray,100,10,10); 

There is nothing wrong with the code above. Accessing the system array directly may be the most performant way in some situations. Now consider the code fragment to be placed inside a loop - leading to repeated calling:

for (int i = 0; i < 100; i++) {
  // reclaim a system array from the memory pool
  double [] sysArray = ILMemoryPool.Pool.New<double>(10000); 
  // call your function to fill the data - the function expects a system array
  myCustomDataFillerFunction(double[] arr,i); 
  // after returning from function - create the array
  A = new ILAray<double>(sysArray,100,10,10); 
  // do something else here ...
  A = A * 2.0; 
  // clean up - place the system array back to the pool
  ILMemoryPool.Pool.RegisterObject(sysArray); 
}

Here the system array sysArray is manually registered and reclaimed from the pool, preventing it from beeing garbage collected and reallocated for each iteration.

System arrays reclaimed from the pool this way are not garanteed to be cleared with default element values! They may still contain data of previous operations! Use the overloaded version New<T>(int length, bool clear, out bool iscleared) for getting a cleared object, or for fast determination, if the elements of the array returned has been cleared by the pool.

Configuring the pool

ILNumerics.Net's memory pool temporarily 'parks' large objects memory for later use. This 'parking place' of course consumes memory by itself. Also the process of managing objects this way cost some amount of computation as well. By default the pool is configured the following way:

parameterremark
MinArrayLengthThe minimal length of arrays may placed into the pool. Only objects equal or larger that size will be cached at all. This prevents the pool from having to organize too many very small objects. By default this value is set to 100 elements.
PoolSizeMBThe overall size the pool may grows to in megabytes. This is the sum of all objects the pool can contain. Keep in mind, this value might permanently be used from your application and is therefore not available for other proccesses anymore. Default value is: 200 MB.

In order to fine tune your algorithms it is possible to configure those values. The pool provides the Reset(int MinArrayLength, int PoolSizeMB) member. This member does resize the pool and alter the size of objects allowed to enter the pool. Keep in mind, that calling this function will

  • dispose all objects currently hold in the pool,
  • change the behavior of the pool for global scope

The following example demonstrates configuring the pool to hold not much more memory than needed in the computation:

// clear and configure the pool
ILMemoryPool.Pool.Reset(10000,40); 
ILArray<double> ret = ILMath.zeros(1000,100,10); 
for (int i = 0; i < 100; i++) {
  ILArray<double> A = ILMath.randn(1000,100,10) + ILMath.ones(1000,100,10) * -2.0;
  A = 0.541 * ILMath.exp(-A); 
  // do something else here ...
  ret += A * 2.0; 
  ILMemoryPool.Collect(0); 
}

The 'algorithm' presented here will create 6 (temporary) ILArray<double> objects of size 1000 x 10 x 10, which will consume roughly 40MB. The pool therefore is configured to hold 40MB and only the largest objects are allowed to enter the pool. After each iteration the pool will be instructed to collect all objects left over for the next iteration. The algorithm therefore reuses all (large) memory objects between each iteration. Compare this method to the most memory saving method utilizing the ILArray.Dispose() function. Here a good compromise is found between the carefull but little unconvenient Dispose() syntax and the higher consumption of memory which the "short" syntax of unoptimized code can bring. Only on the beginning and the end of each loop iteration optimizations had to be done. Nevertheless the memory the code fragment needs can be controlled to a good extend.

Let's take a look onto another example from a performane point of view. Here we consider the following simple algorithm, which will simply sum the result of repeated additions of 2 large arrays. The arrays are of size 4000 x 1000 double elements, consuming roughly 30MB each.

ILArray<double> t1,t2; // temporary arrays 
for (int i = 0; i < 100; i++) {
  // sum A and B, sizes: [4000 x 1000]
  t1 = A + B; 
  t2 = C + t1; 
  // free unused memory -> will get transferd to pool
  t1.Dispose(); 
  C.Dispose(); 
  // keep data for next iteration
  C = t2; 
}

The figure demonstrates how the size of the memory pool influences the performance of the loop. For poolsizes smaller than 60MB, the temporary objects are repeatedly created from the managed heap, resulting in multiple garbage collector runs. As soon as the pool is configured large enough to hold both temporary arrays needed at once, the overall performance of the loop drastically increases. Further increasing the pool size will not further increase the execution speed for this example.

By ommiting the Dispose() calls in the previous example, the time of execution for the loop may increaes by some factors - depending on the test computers hardware. Keep in mind, as soon as the problem size of your algorithm exeeds a certain value, manually controlling the memory management can be of good use.

Valid CSS! Valid XHTML 1.0 Transitional