Referencing Model - Feature Article
This article introduces some advances in memory management for mathematical objects. ILNumerics.Net implements a sophisticated referencing model which will both:
- highly minimize the amount of memory which referencing objects consume by still enabling performant access, and
- transparently manage creation and access to these objects in the background.
What is referencing about?
Whenever an object is created by (repeated or partial) use of another object, the element data of the result can be described by "pointing" to the original data. The "pointer" may be seen as a description of the original object, its elements and its address of use. Most the time such a pointer consumes much less memory than the data it enables access to. Lets construct a simple example:ILArray<double> A = ILMath.randn(10000,5000);
We created the matrix A of size [10 000 x 5000] all filled with normal distributed random numbers. Since A holds double precision floating point data (8 bytes each), this object consumes roughly 380 MB.
Arrays like that are called 'solid arrays'. Now, in order to compute something (f.e. a covariance matrix), we need the transposed version of A:
ILArray<double> AT = A.T;
Not surprisingly, AT is of size [5000 x 10000] now. But how much memory does AT consume? The answer is: less than 60 kilobyte!
In this case only the description "A.T" must be stored. Roughly speaking, in order to access elements of AT, we instead
access the transposed version of A. Arrays like AT here are called 'referencing arrays'.
This kind of referencing is called 'full referencing', because only full dimensions of the source construct the resulting array. But referencing just as well works for partial dimensions, like shown in the following example.
ILArray<double> A = ILMath.randn(10000,5000);
ILArray<double> AP = A[":;0:1999"];
AP is of size [10000 x 2000] now. A solid version would consume about 152 MB of memory. But AP only consumes less than 47 kB! The following table lists some common examples, where referencing occours. Like shown there, it's not limited to regular spaced dimension addressing like in the examples above. You might as well create subarrays from arbitrary (even repeated) parts of the source. Nesting subarray creation is also supported and creates references as well.
expression | creating reference | regular spaced |
|---|---|---|
| A[":;:;:"] | yes | yes |
| A["0:end;2,3;0,2"] | yes | yes |
| A[":;:;2,1,0"] | yes | yes |
| A[":;:;1,2,0"] | yes | no |
| A["3,2,4,1,1,0,1; 3,3,3,3,3; 1,2,0"] | yes | no |
| A[":;:"] | yes | yes |
| A[":;:"].T | yes | yes |
| A[":;:"][":;:"][":;:"].T | yes | yes |
When does referencing occour?
By default referencing occours in the following situations:
- subarray creation (general case, see above)
-
array replication (repmat)
A.R;-> full referencerepmat(A,3,2)-> creates repeated reference of A -
dimension shifts
A.T; A.ShiftDimensions(2); A[1,":;:;:"]; ->all create references -
concatenation (both arrays are the same)
horzcat(A,A); A.Concat(A,1); ->create references -
removal of partial dimensions
A["1;:;:"] = null;->A will be a reference afterwards, having removed the 2nd index from first dimension
When does referencing not work ?
Referencing does not work or is disabled, for
- concatenation for different arrays,
- subarray creation by sequential indices,
- for scalars and vector sized results,
- complex conjugate transposes.
The following table list some negative examples of valid expressions not building references.
expression | creates reference | reason |
|---|---|---|
| A[round(rand(20,30)*59)] | no | sequential index access |
| A["38,23,11,13,11,29,32,0,22,2,1,0"] | no | vector sized result |
| A[":,:"] | no | vector sized result |
How are references deserialized and serialized?
Another pleasant side effect of referencing is the compressed storage while serializing. Attempting to serialize an array of ILArrays, constructed out of at least one solid array and of any number of references to this solid array will keep it's referencing relations after deserialization. Also the amount of storage space needed is conviniently small, therefore the time for serialization is small too.
References and write access
The above scheme only apply, if reading from arrays. Since elements queried are essentially the same on source and destination, they only have to be there once. Writing to elements is different.
As soon as elements are about to change, the reference needs to be detached from the source. Also, if a solid array has references built from it, it must detach from them before writing to it is allowed. This is automatically done by the underlying reference counting of ILNumerics.Net. The system even covers cases like the following:
- a reference B is created from a solid array A
- The solid array A runs out of scope and is garbage collected.
Its underlying memory is preserved since it is still needed for B. - Since B now is the only pointer to the memory (formerly belonging to A), attempts to alter elements of B will directly carried out on it. No detaching occours!
Reference creation control / configuration
In the most common situations the user wont have to care about references at all. Since all functions
of ILNumerics.Net receiving ILArray<>'s automatically handle both solid and referencing
arrays, the user wont even have to recognize, if an array is a reference or not.
Nevertheless, ILNumerics.Net gives advanced users fine grained control over the way, references are handled.
The following table shows, which settings/ member / properties
exist,
the table below explains, where to use them.
| DetachReferences | This property controls, how ILArray objects act on write access. Possible values are one of the enum values of ILNumerics.DetachingBehavior:
|
| MinimumRefDimensions | Number of non singleton dimensions an array must at least contain, in order to be a reference. This value is checked before a reference gets created. It must be >= 2. Note: The memory profit of references increases with the number of dimensions! |
| IsReference (ILArray<> instance member) | true for reference arrays, false for solid arrays |
| Detach() (ILArray<> instance member) | This function may be used at any time, to manually detach a reference from its underlying solid array. The array will be a solid array afterwards. This function has no effect on solid arrays. |
| property | namespace location / class | default / initial value | remarks |
|---|---|---|---|
| DetachReferences | ILNumerics. Settings. ILSettings | DetachSave | global (static) setting. controls default property for all ILArrays (all types)created in the future. |
| DetachReferences | ILNumerics. ILArray<T> | current value of global ILSettings. DetachReferences | global (static) setting for all ILArray<T>. controls behaviour of all ILArrays<T> instances |
| MinimumRefDimensions | ILNumerics. Settings. ILSettings | 2 | global setting, supplies default value for all instances of ILArray<> created in the future |
| MinimumRefDimensions | member for objects of type ILNumerics.Net.ILArray<T> | current value of global ILSettings. MinimumRefDimensions | controls individual instance of a ILArrays<T> object |
Performance considerations for reference arrays
The advantage of saving much memory comes to the price of a small performance disadvantage over solid arrays. Every access to a single element must first be mapped to the underlying solid array's memory. It is not possible to give a general estimate over that penalty. It depends on the number of dimensions, the regularity of the reference and the function accessing the array. As a rule of thumb accesses may take from (1... d)*s, where d is the number of dimensions and s is the time spend for a solid array of the same size.For example the expression
A + B
may take twice as long if A and B are nonregulary spaced references. But the expression
ILMath.multiply(A,A.T)
with A beeing a solid matrix or a regular reference
wont have any disadvantage over pure solid matrices, since the function is carried out using BLAS
functions directly supporting most regulary spaced reference matrices. This case - which is the most
common, references will even
speed up the computation, due to the performance profit for the transpose. (Remember: building references
will usually take less time than copying the whole array.)
However, if you come across to disable references, this is possible:
- globally, by setting the static ILNumerics. Settings. ILSettings.DetachReferences to 'DetachAlways' or
- globally for all ILArray<T> of the inner type T by setting the static member ILArray<T>.DetachReferences to 'DetachAlways', or
- locally for a specific array object by setting the instance member MinimumRefDimensions to a large value.
In this article we learned about what referencing for ILArray types is about, how and when it occours and
how we can control it. We also got a feeling for the savings on memory it can bring.
Basically this is all you may want and need to know about references. ILNumerics.Net will manage the whole model in the background most the time. But if you are interested in even closer inside into the whole thing you can dive into here.
References and proxiesExactly spoken, every ILArray is a proxy, enabling the user to access elements of data on a source. On the most basic level, those sources are
general System.Arrays of dedicated type. The ILArray manages creation,access and destruction of those data arrays while serving as a proxy to the user.
For solid arrays the access to the underlying storage is easy. ILArray simply maps request for elements onto the corresponding position in the System.Array. Indices
into the dimensions of the ILArray will directly get translated into the position for the System.Array.
Now, how are references related to this situations? References actually work exactly the same. Only the mapping for its elements is a little more complicated. This time requests for indices are translated first into
the mapping of reference, which was determined at creation time - and afterwards again into the position of the System.Array. Internally both mappings are compiled into one process. The mechanism is carried out by a class called ILIndexOffset.