All posts by haymo

LLVM everywhere

F#News today published some efforts to utilize the impressive power of the LLVM compiler suite from within F#. The attempts did not turn out to be mature nor stable yet – but it marks some potential of utilizing multi level compilation for runtime optimization: Use high level languages to formulate your algorithm and let lower level optimizations translate your algorithm into highly efficient (platform specific) code. The attempt demonstrated in the post mentioned above still does not sufficiently hide the internals of LLVM. A truely comfortable library would offer a switch to the user only: UsePlatformOptimization - on/off. It would then be the responsibility of the library to transform the high level algorithm into valuable input of the optimizing framework.

LLVM is not the only interesting target for such optimization scenario. Another target is OpenCL. However, most graphic card vendors and Intel (dont know about AMD?) rely on LLVM for their OpenCL implementations already. So it appears there is no way around LLVM …

Quick and dirty tests and misleading results

Today this blog post popped up on my desktop. It describes a quick attempt to outperform the variance function of our friend library MathNet.Numerics by utilizing the Task Parallel Library from within F#. It obviously worked out – by a factor of 10! Since this seemed strange to me, I couldn’t resist to make my own (very quick and dirty) comparison.

The code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ILNumerics; 
using MathNet; 
using MathNet.Numerics.Statistics; 
using System.Diagnostics; 

namespace ConsoleApplication6 {
    class Program : ILMath {
        static void Main(string[] args) {
            long ms = 0; 
            int n = 10000000; 
            int it = 100; 

            ILArray<double> A = randn(1,n);
            Stopwatch sw = new Stopwatch(); 
            for (int i = 0; i < it; i++) {
                ILArray<double> B = var(A);
                ms += sw.ElapsedMilliseconds; 
            Console.WriteLine("ILNumerics needed: {0} ms",ms/(float)it); 
            double[] storage = A.ToArray();
            ms = 0; 
            for (int i = 0; i < it; i++) {
                double a = Statistics.Variance(storage); 
                ms += sw.ElapsedMilliseconds;
            Console.WriteLine("Mathnet.Numerics needed: {0} ms", ms / (float)it); 

The result:

ILNumerics needed: 101,87 ms
Mathnet.Numerics needed: 433,67 ms

So the performance of the MathNet implementation is actually not too bad. ILNumerics does parallelize the var() function (the test run on a dual core) and uses unsafe pointer optimized code for the iteration. So a factor of 4 over MathNet is reasonable. I suppose, the speedup of 10 in the referenced F# blog post is more a measure of memory bandwith. The test there have not been repeatedly run and so it appears, the given implementation is advantageous in terms of memory accesses. Possibly some prefetch of cachelines or similar is going on.

ILNumerics and LINQ

As you may have noticed, ILNumerics arrays implement the IEnumerable interface. This makes them compatible with ‘foreach’ loops and all the nice features of LINQ!

Consider the following example: (Dont forget to include ‘using System.Linq’ and derive your class from ILNumerics.ILMath!)

ILArray<double> A = vec(0, 10);
Console.WriteLine(String.Join(Environment.NewLine, A));

Console.WriteLine("Evens from A:"); 
var evens = from a in A where a % 2 == 0 select a; 
Console.WriteLine(String.Join(Environment.NewLine, evens));

Some people like the Extensions syntax more:

var evens = A.Where(a => a % 2 == 0);

I personally find both equivalently expressive.

Considerations for IEnumerable<T> on ILArray<T>

No option exist in IEnumerable<T> to specify a dimensionality. Therefore, and since ILNumerics arrays store their elements in column major order, enumerating an ILNumerics array will be done along the first dimension. Therefore, when used on a matrix, the enumerator runs along the columns:

ILArray<double> A = counter(3,4); 
Console.WriteLine(A + Environment.NewLine); 
foreach(var a in A) 

… will give the following:

<Double> [3,4]
         1          4          7         10
         2          5          8         11
         3          6          9         12


Secondly, as is well known, accessing elements returned from IEnumerable<T> is only possible in a read-only manner! In order to alter elements of ILNumerics arrays, one should use the explicit API provided by our arrays. See SetValue, SetRange, A[..] = .. and GetArrayForWrite()

Lastly, performance considerations arise by excessive utilization of IEnumerable<T> in such situations, where high performance computations are desirable. ILNumerics does integrate well with IEnumerable<T> – but how well IEnumerable<T> does integrate into the memory management of ILNumerics should be investigated with help of your favorite profiler. I would suspect, most every day scenarios do work out pretty good with LINQ since it concatenates all expressions and queries and iterates the ILNumerics array only once. However, let us know your experiences!

Microsoft.Numerics, Cloud Numerics for Azure – a short Review

Today I found some time to take a look at the Cloud Numerics project at Microsoft. I started with the overview/ introduction post by Ronnie Hoogerwerf found at the Cloud Numerics blog at msdn.

The project aims at computations on very large distributed data sets and is intended for Azure. Interesting news for me: the library shows quite some similarities to ILNumerics. It provides array classes on top of native wrappers, utilizing MPI, PBLAS and ScaLAPACK. A runtime is deployed with the project binaries: Microsoft.Numerics, which provides all the classes described here.

‘Local’ Arrays in “Cloud Numerics”

The similarity is most obvious when comparing the array implementations: Both, ILNumerics and Cloud Numerics utilize multidimensional generic arrays. Cloud Numerics arrays all derive from Microsoft.Numerics.IArray<T> – not to be confused with ILNumerics local arrays ILArray<T> ;)!

Important properties of arrays in ILNumerics are provided by the concrete array implementation of an array A (A.Size.NumberOfElements, A.Size.NumberOfDimensions, A.Reshape, A.T for the Transpose a.s.o.). On the Cloud Numerics side, those properties are provided by the interface IArray<T>: A.NumberOfDimensions, A.NumberOfElements, A.Reshape(), A.Transpose() a.s.o).

A similar analogy is found in the element types supported by ILArray<T> and Microsoft.Numerics.IArray<T>. Both allow the regular System numeric value types, as System.Int32, System.Double and System.Single. Interestingly – both do not rely on System.Numerics.Complex as the main double precisioin complex data element type but rather implement their own for both: single precision and double precision.

Both array types support vector expansion, at least Cloud Numerics promises to do so in the next release. For now, only scalar binary operations are allowed for arrays. For an explanation of the feature it refers to NumPy rather than ILNumerics though.

Arrays Storage in “Cloud Numerics”

The similarities end when it comes to internal array storage. Both do store multidimensional arrays as one dimensional arrays internally. But Cloud Numerics stores its elements in native one dimensional arrays. They argue with the 2GB limit introduced for .NET objects and further elaborate:

Additionally, the underlying native array types in the “Cloud Numerics” runtime are sufficiently flexible that they can be easily wrapped in environments other than .NET (say Python or R) with very little additional effort.

It is hard follow that view. Out of my experience, .NET arrays are perfectly suitable for such interaction with native libraries, since at the end it is just a pointer to memory passed to thoses libs. Regarding the limit of 2GB: I assume a ‘problem size’ of more than 2GB would hardly be handled on one node. Especially a framework for distributed memory I would have expected to switch over to shared memory about at this limit at least?

In the consequence, interaction between Cloud Numerics and .NET arrays becomes somehow clumsy and – if it comes to really large datasets – with an expected performance hit (disclaimer: untested, of course).

Differences keep coming: indexing features are somehow basic in Cloud Numerics. By now, they support scalar element specification only and restrict the number of dimension specifier to be the same as the number of dimensions in the array. Therefore, subarrays seems to be impossible to work with. I will have an eye on it, if the project will support array features like A[full, end / 2 + 1] in one of the next releases ;)

I wonder, how the memory management is done in Cloud Numerics. The library provides overloaded operators and hence faces the same problems, which have led to the sophisticated memory management in ILNumerics: if executed in tight loops, expression like

A = 0.5 * (A + A') + 0.5 * (A - A')

on ‘large’ arrays A will inevitably lead to memory pollution if run without deterministic disposal! Not to speak about (virtual) memory fragmentation and the problems introduced by heavy unmanaged resources in conjunction with .NET objects and the GC … I could not find the time to really test it live, but I am almost sure, the targeted audience with really large problem sizes somehow contradicts this approach. Unless there is some hidden mechanism in the runtime (which I doubt, because the use of ‘var’ and regular C#, hence without the option to overload the assignment operator), this could evolve to a real nuissance IMO.

Distributed Arrays

This part seems straightforward. It follows the established scheme known from MPI but offers a nicer interface to the user. Also the Cloud Numerics Runtime wraps away the overhead of cluster management, array slicing to a good extend. However, the question of memory management again arises on the distributed side as well. Since the API exposed to the user (obviously?) does not take care of disposal of temporary arrays in a timely fashion, the performance for large arrays will most likely be suffering.

As soon as I find out more details about their internal memeory management I will post them here – hopefully together with some corrections of my assumptions.

SOS.dll with new commands in 4.0

I have always been a great fan of the SOS.dll debugger extensions. It provides huge help in just so many situations, where a deep look into the inner state of the CLR is needed at runtime and the common debugging tools of Visual Studio simply dont go far enough. When working on the new memory management of ILNumerics SOS many times lived up to its name and allowed the final insight needed to make it work. How I wished, the processor itself would expose similar potential to find out ‘what’s going on’ as the CLR does …!

Tess’ blogpost about new commands in SOS for CLR 4.0 therefore triggered great expectations here. So I finally managed to take a quick look onto it:

!help already seems to come up with a whole lot of much more commands than before. In the past I have been using a small subset only, mainly !GCRoot, !DumpHeap, !DumpObject and !DumpClass. Some of the newly appeared commands sound promising as well: !HeapStat, !ListNearObj, !AnalyzeOOM, !HistInit, !HistObj, !HisttObjFind, !HistRoot, !ThreadState,!FindRoots, !GCWhere and !VerifyObj.

Wow! GCWhere, FindRoots and those Hist??? command definitely deserve a closer look in one of our next programming sessions – even if the next GC issue is not very lickely to appear for ILNumerics really soon ;)

I have not been able to find part II of Tess’ blog post, but luckily the msdn documentation is still there for those commands.

Strange: between all those reputable commands one is found called ‘!FAQ’. I tried to find out, which answers the users of the SOS might seek most intensively. But unfortunately, it didn’t work out:

The name 'FAQ' does not exist in the current context

:| ??

@Update: Somehow I really have missed the fact, that nowadays, everybody seem to be using a new tool, superseding the SOS.dll: Psscor4…. Anyway, those commands will nevertheless be checked in there. :)