Tag Archives: array

N-dim Array Broadcasting Efficiency in ILNumerics 4.10

Next to other great improvements in version 4.10 of ILNumerics Ultimate VS, it is especially one new feature which requires some attention: general broadcasting for n-dimensional ILArrays.

Broadcasting as a concept today is found in many popular mathematical prototyping systems. The most direct correspondence probably exists in the numpy package. Matlab and Octave offer similar functionality by means of the bsxfun function.

The term ‘broadcasting’ refers to a binary operator which is able to apply an elementwise operation to the elements of two n-dimensional arrays. The ‘operation’ is often as simple as a straight addition (ILMath.add, ILMath.divide, a.s.o.). However, what is special about broadcasting is that it allows the operation even for the case where both arrays involved do not have the same number of elements.

Broadcasting in ILNumerics prior Version 4.10

In ILNumerics, broadcasting is available for long already. But prior version 4.10 it was limited to scalars operating on n-dim arrays and vectors operating on matrices. Therefore, we had used the term ‘vector expansion’ instead of broadcasting. Obviously, broadcasting can be seen as a generalization of vector expansion.

Let’s visualize the concept by considering the following matrix A:

A

1  5   9  13  17
2  6  10  14  18
3  7  11  15  19
4  8  12  16  20


Matrix A might represent 5 data points of a 4 dimensional dataset as columns. One common requirement is to apply a certain operation to all datapoints in a similar way. In order to, let’s say, scale/weight the components of each dimension by a certain factor, one would multiply each datapoint with a vector of length 4.


ILArray<double> V = new[] { 0.5, 3.0, 0.5, 1.0 };

0.5
3.0
0.5
1.0


The traditional way of implementing this operation would be to expand the scaling vector by replicating it from a single column to a matrix matching the size of A.

VExp = repmat(V, 1, 5);

0.5  0.5  0.5  0.5  0.5
3.0  3.0  3.0  3.0  3.0
0.5  0.5  0.5  0.5  0.5
1.0  1.0  1.0  1.0  1.0


Afterwards, the result can be operated with A elementwise in the common way.

ILArray<double> Result = VExp * A;

0.5   2.5   4.5   6.5   8.5
6.0  18.0  30.0  42.0  54.0
1.5   3.5   5.5   7.5   9.5
4.0   8.0  12.0  16.0  20.0


The problem with the above approach is that the vector data need to be expanded first. There is little advantage in doing so: a lot of new memory is being used up in order to store completely redundant data. We all know that memory is the biggest bottleneck today. We should prevent from lots of memory allocations whenever possible. This is where vector expansion comes into play. In ILNumerics, for long, one can prevent from the additional replication step and operate the vector on the matrix directly. Internally, the operation is implemented in a very efficient way, without replicating any data, without allocating new memory.

ILArray<double> Result = V * A;

0.5   2.5   4.5   6.5   8.5
6.0  18.0  30.0  42.0  54.0
1.5   3.5   5.5   7.5   9.5
4.0   8.0  12.0  16.0  20.0

Generalizing for n-Dimensions

Representing data as matrices is very popular in scientific computing. However, if the data are stored into arrays of other shapes, having more than two dimensions, one had to fall back to repmatting in order for the binary operation to succeed. This nuissance has been removed in version 4.10.

Now it is possible to apply broadcasting to two arrays of any matching shape – without the need for using repmat. In order for two arrays to ‘match‘ in the binary operation, the following rules must be fullfilled:

1. All corresponding dimensions of both arrays must match.
2. In order for two  corresponding dimensions to match,
• both dimensions must be of the same length, or
• one of the dimensions must be of length 1.

An example of two matching arrays would be a vector running along the 3rd dimension and a 3 dimensional array:

In the above image the vector (green) has the same length as the corresponding dimension of the 3D array (gray). The size of the vector is [1 x 1 x 6]. The size of the 3D array is [4 x 5 x 6]. Hence, any dimension of both, the vector and the 3D array ‘match’ in terms of broadcasting. A broadcasting operation for both, the vector and the array would give the same result as if the vector would be replicated along the 1st and the 2nd dimensions. The first element will serve all elements in the first 4 x 5 slice in the 1-2 plane. This slice is marked red in the next image: Note that all red elements here derive from the same value – from the first element of the green vector.  The same is true for all other vector elements: they fill corresponding slices on the 3D array along the 3rd dimension.

Slowly, a huge performance advantage of broadcasting becomes clear: the amount of memory saved explodes when more, longer dimensions are involved.

In the most general case and if broadcasting is blindly applied, the following special case potentially causes issues. Consider two vectors, one row vector and one column vector being provided as input parameters to a binary operation. In ILNumerics, every array carries at least two dimensions. A column vector of length 4 is considered an array of size [4 x 1]. A row vector of length 5 is considered an array of size [1 x 5]. In fact, any two vectors match according to the general  broadcasting rules.

As a consequence operating a row vector [1 x 5] with a column vector [4 x 1] results in a matrix [4 x 5]. The row vector is getting ‘replicated’ (again, without really executing the replication) four times along the 1st dimension, and the column vector 5 times along the rows.

array(new[] {1.0,2.0,3.0,4.0,5.0}, 1, 5) +array(new[] {1.0,2.0,3.0,4.0}, 4, 1)

<Double> [4,5]
[0]:          2          3          4          5          6
[1]:          3          4          5          6          7
[2]:          4          5          6          7          8
[3]:          5          6          7          8          9


Note, in order for the above code example to work, one needs to apply a certain switch:

Settings.BroadcastCompatibilityMode = false;


The reason is that in the traditional version of ILNumerics (just like in Matlab and Octave) the above code would simply not execute but throw an exception instead. Originally, binary operations on vectors would ignore the fact that vectors are matrices and only take the length of the vectors into account, operating on the corresponding elements if the length of both vectors do match. Now, in order to keep compatibility for existing applications, we kept the former behavior.

The new switch ‘Settings.BroadcastCompatibilityMode’ by default is set to ‘true’. This will cause the Computing Engine to throw an exception when two vectors of inequal length are provided to binary operators. Applying vectors of the same length (regardless of their orientation) will result in a vector of the same length.

If the ‘Settings.BroadcastCompatibilityMode’ switch is set to ‘false’ then general broadcasting is applied in all cases according to the above rules – even on vectors. For the earlier vector example this leads to the resulting matrix as shown above: operating a row on a column vector expands both vectors and gives a matrix of corresponding size.

Further reading: binary operators, online documentation

ILNumerics Language Features: Limitations for C#, Part II: Compound operators and ILArray

A while ago I blogged about why the CSharp var keyword cannot be used with local ILNumerics arrays (ILArray<T>, ILCell, ILLogical). This post is about the other one of the two main limitations on C# language features in ILNumerics: the use of compound operators in conjunction with ILArray<T>. In the online documentation we state the rule as follows:

The following features of the C# language are not compatible with the memory management of ILNumerics and its use is not supported:

• The C# var keyword in conjunction with any ILNumerics array types, and
• Any compound operator, like +=, -=, /=, *= a.s.o. Exactly spoken, these operators are not allowed in conjunction with the indexer on arrays. So A += 1; is allowed. A[0] += 1; is not!

Let’s take a closer look at the second rule. Most developers think of compound operators as being just syntactic sugar for some common expressions:

int i = 1;
i += 2;

… would simply expand to:

int i = 1;
i  = i + 2; 

For such simple types like an integer variable the actual effect will be indistinguishable from that expectation. However, compound operators introduce a lot more than that. Back in his times at Microsoft, Eric Lippert blogged about those subtleties. The article is worth reading for a deep understanding of all side effects. In the following, we will focus on the single fact, which becomes important in conjunction with ILNumerics arrays: when used with a compound operator, i in the example above is only evaluated once! In difference to that, in i = i + 2, i is evaluated twice.

Evaluating an int does not cause any side effects. However, if used on more complex types, the evaluation may does cause side effects. An expression like the following:

ILArray<double> A = 1;
A += 2;

… evaluates to something similiar to this:

ILArray<double> A = 1;
A = (ILArray<double>)(A + 2); 

There is nothing wrong with that! A += 2 will work as expected. Problems arise, if we include indexers on A:

ILArray<double> A = ILMath.rand(1,10);
A[0] += 2;
// this transforms to something similar to the following:
var index = (ILRetArray<double>)0;
receiver[index] = receiver[index] + 2; 

In order to understand what exactly is going on here, we need to take a look at the definition of indexers on ILArray:

public ILRetArray<ElementType> this[params ILBaseArray[] range] { ...

The indexer expects a variable length array of ILBaseArray. This gives most flexibility for defining subarrays in ILNumerics. Indexers allow not only scalars of builtin system types as in our example, but arbitrary ILArray and string definitions. In the expression A[0], 0 is implicitly converted to a scalar ILNumerics array before the indexer is invoked. Thus, a temporary array is created as argument. Keep in mind, due to the memory management of ILNumerics, all such implicitly created temporary arrays are immediately disposed off after the first use.

Since both, the indexing expression 0 and the object where the indexer is defined for (i.e.: A) are evaluated only once, we run into a problem: index is needed twice. At first, it is used to acquire the subarray at receiver[index]. The indexer get { ...}  function is used for that. Once it returns, all input arguments are disposed – an important foundation of ILNumerics memory efficency! Therefore, if we invoke the index setter function with the same index variable, it will find the array being disposed already – and throws an exception.

It would certainly be possible to circumvent that behavior by converting scalar system types to ILArray instead of ILRetArray:

ILArray A = ...;
A[(ILArray)0] += 2;

However, the much less expressive syntax aside, this would not solve our problem in general either. The reason lies in the flexibility required for the indexer arguments. The user must manually ensure, all arguments in the indexer argument list are of some non-volatile array type. Casting to ILArray<T> might be an option in some situations. However, in general, compound operators require much more attention due to the efficient memory management in ILNumerics. We considered the risk of failing to provide only non-volatile arguments too high. So we decided not to support compound operators at all.

See: General Rules for ILNumerics, Function Rules, Subarrays

Microsoft.Numerics, Cloud Numerics for Azure – a short Review

Today I found some time to take a look at the Cloud Numerics project at Microsoft. I started with the overview/ introduction post by Ronnie Hoogerwerf found at the Cloud Numerics blog at msdn.

The project aims at computations on very large distributed data sets and is intended for Azure. Interesting news for me: the library shows quite some similarities to ILNumerics. It provides array classes on top of native wrappers, utilizing MPI, PBLAS and ScaLAPACK. A runtime is deployed with the project binaries: Microsoft.Numerics, which provides all the classes described here.

‘Local’ Arrays in “Cloud Numerics”

The similarity is most obvious when comparing the array implementations: Both, ILNumerics and Cloud Numerics utilize multidimensional generic arrays. Cloud Numerics arrays all derive from Microsoft.Numerics.IArray<T> – not to be confused with ILNumerics local arrays ILArray<T> ;)!

Important properties of arrays in ILNumerics are provided by the concrete array implementation of an array A (A.Size.NumberOfElements, A.Size.NumberOfDimensions, A.Reshape, A.T for the Transpose a.s.o.). On the Cloud Numerics side, those properties are provided by the interface IArray<T>: A.NumberOfDimensions, A.NumberOfElements, A.Reshape(), A.Transpose() a.s.o).

A similar analogy is found in the element types supported by ILArray<T> and Microsoft.Numerics.IArray<T>. Both allow the regular System numeric value types, as System.Int32, System.Double and System.Single. Interestingly – both do not rely on System.Numerics.Complex as the main double precisioin complex data element type but rather implement their own for both: single precision and double precision.

Both array types support vector expansion, at least Cloud Numerics promises to do so in the next release. For now, only scalar binary operations are allowed for arrays. For an explanation of the feature it refers to NumPy rather than ILNumerics though.

Arrays Storage in “Cloud Numerics”

The similarities end when it comes to internal array storage. Both do store multidimensional arrays as one dimensional arrays internally. But Cloud Numerics stores its elements in native one dimensional arrays. They argue with the 2GB limit introduced for .NET objects and further elaborate:

Additionally, the underlying native array types in the “Cloud Numerics” runtime are sufficiently flexible that they can be easily wrapped in environments other than .NET (say Python or R) with very little additional effort.

It is hard follow that view. Out of my experience, .NET arrays are perfectly suitable for such interaction with native libraries, since at the end it is just a pointer to memory passed to thoses libs. Regarding the limit of 2GB: I assume a ‘problem size’ of more than 2GB would hardly be handled on one node. Especially a framework for distributed memory I would have expected to switch over to shared memory about at this limit at least?

In the consequence, interaction between Cloud Numerics and .NET arrays becomes somehow clumsy and – if it comes to really large datasets – with an expected performance hit (disclaimer: untested, of course).

Differences keep coming: indexing features are somehow basic in Cloud Numerics. By now, they support scalar element specification only and restrict the number of dimension specifier to be the same as the number of dimensions in the array. Therefore, subarrays seems to be impossible to work with. I will have an eye on it, if the project will support array features like A[full, end / 2 + 1] in one of the next releases

I wonder, how the memory management is done in Cloud Numerics. The library provides overloaded operators and hence faces the same problems, which have led to the sophisticated memory management in ILNumerics: if executed in tight loops, expression like

A = 0.5 * (A + A') + 0.5 * (A - A')

on ‘large’ arrays A will inevitably lead to memory pollution if run without deterministic disposal! Not to speak about (virtual) memory fragmentation and the problems introduced by heavy unmanaged resources in conjunction with .NET objects and the GC … I could not find the time to really test it live, but I am almost sure, the targeted audience with really large problem sizes somehow contradicts this approach. Unless there is some hidden mechanism in the runtime (which I doubt, because the use of ‘var’ and regular C#, hence without the option to overload the assignment operator), this could evolve to a real nuissance IMO.

Distributed Arrays

This part seems straightforward. It follows the established scheme known from MPI but offers a nicer interface to the user. Also the Cloud Numerics Runtime wraps away the overhead of cluster management, array slicing to a good extend. However, the question of memory management again arises on the distributed side as well. Since the API exposed to the user (obviously?) does not take care of disposal of temporary arrays in a timely fashion, the performance for large arrays will most likely be suffering.

As soon as I find out more details about their internal memeory management I will post them here – hopefully together with some corrections of my assumptions.

Why the ‘var’ keyword is not allowed in ILNumerics

One of the rules related to the new ILNumerics memory management, which potentially causes issues, is not to use the var keyword of C#. In Visual Basic, similar functionality is available by ommiting the explicit type declarations for array types.

ILArray<double> A = rand(5,4,3); // explicit array type declaration required
var B = rand(5,4,3); // NOT ALLOWED!

Lets take a look at the reasons, why this – otherwise convenient – language feature is not allowed in conjunction with array declarations. In order to make this clear, a little survey into the internals of the ILNumerics memory management is needed.

In ILNumerics, local arrays are of one of the types ILArray<T>, ILLogicalArray, ILCell. Those array types are one building block of the memory managemt in ILNumerics. By using one of those array types in a function, one can be sure, to keep the array alive and available as long as the function is not left. On the other hand – as soon as the function was left, the array will be recycled immediately.

Other array types exist in ILNumerics. They serve different purposes regarding their lifetime and mutability. Input arrays like ILInArray<T> f.e., make sure, arrays given as function parameters are unable to get altered.

Another important type is the return type of any function. Every function, property or operator in ILNumerics returns arrays of either ILRetArray<T>, ILRetLogical or ILRetCell. Return arrays are volatile or temporary arrays. Their use is restricted to exactly one time. After the first use, return arrays are disposed immediately. For expressions like the following, this behavior drastically increases memory efficiency:

abs(pow(cos(A*pi/2+t),2)

Assuming A to be a (rather large) array, 7 temporary memory storages of the size of A would be necessary in order to evaluate the whole expression. But if we take the above assumption regarding the lifetime of return arrays into account, that number is reduced to at most 2 temporary arrays. The reason: A * pi needs one storage for the result: Result1. It is than used to compute [Result1] / 2. Here, another storage is needed for the new result: Result2. At the end of the division operation, [Result1] has already been used for the first time. Since it is a Return Type, it is released and its storage recycled. For the next calculation [Result2] + t, the storage from [Result1] is already found in the memory pool of ILNumerics. Therefore, no new storage is needed and both temporary storages are alternatingly used for the subsequent evaluations.

Lets assume, the expression above does only make sense, if we can retrieve the result and use it in subsequent expressions inside our function. The most common case would be to assign the result to a local variable. Now, we get close to the interesting part: If we would allow the var keyword to be used, C# would generate a local variable B of type ILRetArray<double>:

// DO NOT DO THIS!!
var B = abs(pow(cos(A*pi/2+t),2);    // now B is of type ILRetArray<double> !

Console.Out.Write(B.Length);
Console.Out.Write(B.ToString());    //<-- fails! 

Besides the fact, that this would conflict with the function rules of ILNumerics (local array types must be of ILArray<T> or similar), B could only be used for exactly one time! In the example, B.Length does execute normaly. After that statement, B gets disposed. Therefore, the statement B.ToString() will already fail. This is, why var is not permitted.

By explicitely declaring the local array type, the compiler will use implicit type conversions in order to convert a return array to a local array, which is than available for the rest of the function block:

// this code is correct
ILArray<double> B = abs(pow(cos(A*pi/2+t),2);    // now B is of type ILArray<double> !

Console.Out.Write(B.Length);
Console.Out.Write(B.ToString());    // works as expected

See: General Rules for ILNumerics, Function Rules