Referencing Model - Feature Article

This article introduces some advances in memory management for mathematical objects. ILNumerics.Net implements a sophisticated referencing model which will both:

  • highly minimize the amount of memory which referencing objects consume by still enabling performant access, and
  • transparently manage creation and access to these objects in the background.
In practical situations you wont have much to care about referencing. ILNumerics.Net does all the work in the background. If you want or must to intervene anyhow, ILNumerics.Net gives you fine granularity of possibilities, which are described at the end of this article.

What is referencing about?

Whenever an object is created by (repeated or partial) use of another object, the element data of the result can be described by "pointing" to the original data. The "pointer" may be seen as a description of the original object, its elements and its address of use. Most the time such a pointer consumes much less memory than the data it enables access to. Lets construct a simple example:
ILArray<double> A = ILMath.randn(10000,5000);
We created the matrix A of size [10 000 x 5000] all filled with normal distributed random numbers.

Since A holds double precision floating point data (8 bytes each), this object consumes roughly 380 MB.

Arrays like that are called 'solid arrays'. Now, in order to compute something (f.e. a covariance matrix), we need the transposed version of A:

ILArray<double> AT = A.T;
Not surprisingly, AT is of size [5000 x 10000] now. But how much memory does AT consume? The answer is: less than 60 kilobyte! In this case only the description "A.T" must be stored. Roughly speaking, in order to access elements of AT, we instead access the transposed version of A. Arrays like AT here are called 'referencing arrays'.

This kind of referencing is called 'full referencing', because only full dimensions of the source construct the resulting array. But referencing just as well works for partial dimensions, like shown in the following example.

ILArray<double> A = ILMath.randn(10000,5000); 
ILArray<double> AP = A[":;0:1999"];
AP is of size [10000 x 2000] now. A solid version would consume about 152 MB of memory. But AP only consumes less than 47 kB!

The following table lists some common examples, where referencing occours. Like shown there, it's not limited to regular spaced dimension addressing like in the examples above. You might as well create subarrays from arbitrary (even repeated) parts of the source. Nesting subarray creation is also supported and creates references as well.

expression
A = rand(5,4,3); 
AR = ...
creating
reference
regular
spaced
A[":;:;:"]yesyes
A["0:end;2,3;0,2"]yesyes
A[":;:;2,1,0"]yesyes
A[":;:;1,2,0"]yesno
A["3,2,4,1,1,0,1; 3,3,3,3,3; 1,2,0"]yesno
A[":;:"]yesyes
A[":;:"].Tyesyes
A[":;:"][":;:"][":;:"].Tyesyes

When does referencing occour?

By default referencing occours in the following situations:

  • subarray creation (general case, see above)
  • array replication (repmat)
     A.R;     -> full reference
     repmat(A,3,2)  -> creates repeated reference of A
  • dimension shifts
    A.T; A.ShiftDimensions(2); A[1,":;:;:"]; ->  all create references
  • concatenation (both arrays are the same)
    horzcat(A,A); A.Concat(A,1); ->  create references 
  • removal of partial dimensions
    A["1;:;:"] = null;   
        ->A will be a reference afterwards, having removed the 2nd index from first dimension

When does referencing not work ?

Referencing does not work or is disabled, for

  • concatenation for different arrays,
  • subarray creation by sequential indices,
  • for scalars and vector sized results,
  • complex conjugate transposes.

The following table list some negative examples of valid expressions not building references.

expression
A  = rand(5,4,3); 
AR = ...
creates referencereason
A[round(rand(20,30)*59)]nosequential index access
A["38,23,11,13,11,29,32,0,22,2,1,0"]novector sized result
A[":,:"]novector sized result

How are references deserialized and serialized?

Another pleasant side effect of referencing is the compressed storage while serializing. Attempting to serialize an array of ILArrays, constructed out of at least one solid array and of any number of references to this solid array will keep it's referencing relations after deserialization. Also the amount of storage space needed is conviniently small, therefore the time for serialization is small too.

References and write access

The above scheme only apply, if reading from arrays. Since elements queried are essentially the same on source and destination, they only have to be there once. Writing to elements is different.

As soon as elements are about to change, the reference needs to be detached from the source. Also, if a solid array has references built from it, it must detach from them before writing to it is allowed. This is automatically done by the underlying reference counting of ILNumerics.Net. The system even covers cases like the following:

  1. a reference B is created from a solid array A
  2. The solid array A runs out of scope and is garbage collected.
    Its underlying memory is preserved since it is still needed for B.
  3. Since B now is the only pointer to the memory (formerly belonging to A), attempts to alter elements of B will directly carried out on it. No detaching occours!

Reference creation control / configuration

In the most common situations the user wont have to care about references at all. Since all functions of ILNumerics.Net receiving ILArray<>'s automatically handle both solid and referencing arrays, the user wont even have to recognize, if an array is a reference or not.
Nevertheless, ILNumerics.Net gives advanced users fine grained control over the way, references are handled. The following table shows, which settings/ member / properties exist, the table below explains, where to use them.

DetachReferencesThis property controls, how ILArray objects act on write access. Possible values are one of the enum values of ILNumerics.DetachingBehavior:
  • DetachNever: ILArray's will never automatically detach. This can lead to situations, where altering the elements of one array also change the elements of another array, if the second is referencing the same solid array elements.
  • DetachOnWrite: Referencing arrays will automatically detach themself before attempting to alter any values used. The results are self dereferencing arrays which act to the outside world, like they would all consist out of solid storages, but internally save memory by not creating any real copies of arrays as long as it is not absolutely neccessary.
  • DetachAlways: Attempts to create a reference of an existing array will result in copying the values. This is the way other (native) mathematical engines usually handle their storages. It consumes more memory, but will sometimes lead to increased performance for large computations, since physical storages are optimized for faster element access.
  • DetachSave: This value acts like 'DetachOnWrite' except a storage will not be detached, if it is the only reference to the underlying physical storage.
MinimumRefDimensionsNumber of non singleton dimensions an array must at least contain, in order to be a reference. This value is checked before a reference gets created. It must be >= 2.
Note: The memory profit of references increases with the number of dimensions!
IsReference
(ILArray<> instance member)
true for reference arrays, false for solid arrays
Detach()
(ILArray<> instance member)
This function may be used at any time, to manually detach a reference from its underlying solid array. The array will be a solid array afterwards.
This function has no effect on solid arrays.

 

propertynamespace location / classdefault / initial valueremarks
DetachReferences ILNumerics. Settings. ILSettingsDetachSaveglobal (static) setting. controls default property for all ILArrays (all types)created in the future.
DetachReferences ILNumerics. ILArray<T>current value of global ILSettings. DetachReferencesglobal (static) setting for all ILArray<T>. controls behaviour  of all ILArrays<T> instances 
MinimumRefDimensions ILNumerics. Settings. ILSettings2global setting, supplies default value for all instances of ILArray<> created in the future
MinimumRefDimensionsmember for objects of type ILNumerics.Net.ILArray<T>current value of global ILSettings. MinimumRefDimensionscontrols individual instance of a ILArrays<T> object

Performance considerations for reference arrays

The advantage of saving much memory comes to the price of a small performance disadvantage over solid arrays. Every access to a single element must first be mapped to the underlying solid array's memory. It is not possible to give a general estimate over that penalty. It depends on the number of dimensions, the regularity of the reference and the function accessing the array. As a rule of thumb accesses may take from (1... d)*s, where d is the number of dimensions and s is the time spend for a solid array of the same size.

For example the expression

A + B
may take twice as long if A and B are nonregulary spaced references. But the expression
ILMath.multiply(A,A.T)
with A beeing a solid matrix or a regular reference wont have any disadvantage over pure solid matrices, since the function is carried out using BLAS functions directly supporting most regulary spaced reference matrices. This case - which is the most common, references will even speed up the computation, due to the performance profit for the transpose. (Remember: building references will usually take less time than copying the whole array.)

However, if you come across to disable references, this is possible:

  • globally, by setting the static ILNumerics. Settings. ILSettings.DetachReferences to 'DetachAlways' or 
  • globally for all ILArray<T> of the inner type T by setting the static member ILArray<T>.DetachReferences to 'DetachAlways', or
  • locally for a specific array object by setting the instance member MinimumRefDimensions to a large value.

In this article we learned about what referencing for ILArray types is about, how and when it occours and how we can control it. We also got a feeling for the savings on memory it can bring.

Basically this is all you may want and need to know about references. ILNumerics.Net will manage the whole model in the background most the time. But if you are interested in even closer inside into the whole thing you can dive into here.

References and proxies

Exactly spoken, every ILArray is a proxy, enabling the user to access elements of data on a source. On the most basic level, those sources are general System.Arrays of dedicated type. The ILArray manages creation,access and destruction of those data arrays while serving as a proxy to the user.

For solid arrays the access to the underlying storage is easy. ILArray simply maps request for elements onto the corresponding position in the System.Array. Indices into the dimensions of the ILArray will directly get translated into the position for the System.Array.

Now, how are references related to this situations? References actually work exactly the same. Only the mapping for its elements is a little more complicated. This time requests for indices are translated first into the mapping of reference, which was determined at creation time - and afterwards again into the position of the System.Array. Internally both mappings are compiled into one process. The mechanism is carried out by a class called ILIndexOffset.


Valid CSS! Valid XHTML 1.0 Transitional