## Fast. Faster …. Performance Comparison: C# (ILNumerics), FORTRAN, MATLAB and numpy – Part I

“Ehh..! Yet another comparison” you may say. And this is, what it is. Nevertheless, the results might be astonishing…
This post will be split in several parts. Dont know yet, how many. But at least 2. The first part will try to find a little overview or categorization of the conjunction between the terms ‘performance’ and ‘language’.

## High Performance – what does it mean?

Several attempts have already been made to measure the impact the .NET CLR introduces to heavy numerical computations. Naturally, this is hard to generalize, since the final execution speed of any program does depend on so many factors. The initial language for the algorithm being only one of them. Among others are important:

• the set of machine instructions presented to the CPU(s) and how the processor is able to optimize their execution
• how do the compiler(s) used to get the machine code out of the higher level language are able to support the processor(s) in executing the code fast
• how do the compiler(s) and/ or the high level algorithm prepare for the utilization of all available computing resources on the executing machine.

“Fast” or “high performance” languages are able to succeed in those points better than those languages, specialized for other tasks than performance. A positive example is FORTRAN, a language designed for other purposes (yet) may be found in JavaScript. At the end, the language (ie. the “features” of the language) are less important than the compilers used to generate executable code. So this is, where we get to the main difference between virtual language environments (Java, .NET and many others) and so called ‘native’ code (C/C++, FORTRAN being the most popular). While the native fraction does typically utilize one compiler to generate the final machine code in one step, the virtual fraction does introduce at least two steps. A language specific compiler generating “byte code” (‘Intermediate Language’) at development time and a Just in Time (JIT) compiler transforming that byte code into machine instructions right before the execution.

## ‘Virtually High’ Performance

It is a common (marketing?) argument, that the JIT compiler of a virtual machine is able to optimize the final code much better, since at runtime the actual system resources (model and number of CPU(s), cache sizes etc.) are well known. In reality, the speed of programs written for virtual machines suffer from the lack of implemented optimizations. The reason might be found in the intended audience for those environments: enterprise applications simply do not need numberchrunching performance!

Nevertheless, many attempts exist from the scientific community to utilize virtual runtime environments for numerical computing. Due to its maturity most of them aim towards Java [1],[2],[3]. Attempts for .NET just like most attempts for Java aim at providing wrapper classes in order to make the performance of ‘native’ libraries available to the higher level language. And there is nothing dubious about it. Developers writing algorithms in ‘native’ languages (lets say FORTRAN) are well advised to utilize those libraries as well. Everyone would have a hard time to beat their performance by writing – lets say – a faster matrix multiplication than the one provided by the Intel MKL (just to name one). I recently looked into exact that topic and will possibly publish those results in a later post.

So in order to categorize languages after their suitability for scientific computing, we identify the following needs:

• The language should provide a ‘nice’ syntax to the user. This includes things like multidimensional arrays and overloaded operators.
• There must exist tools (like compilers and/or other supporting infrastructure, possibly a library) which support the fast execution on a wide range of machines.

For a very long time, FORTRAN has been the best language according to those goals. But recent changes in the computer architecture show a significant shift in the responsibility to care about performance. While earlier, software developers commonly relied on the processor (or their designers) to run any code as fast as possible, nowadays this is over. There will be no significant improvement in a single processing unit in upcoming design cycles. Major improvements will be made by utilizing thread level performance, ie. the utilization of multicore machines and GPU power. But it will be the responsibility of the software layer now, to utilize those resources.
It wouldn’t be the ILNumerics Developer Blog, if we wouldn’t believe, that .NET and C# have very good potential to fullfill those needs – possibly better than any other major language around. In terms of the syntactic sugar the proove has been done already – simply download and try it yourself if you haven’t done yet! But ILNumerics also fullfills the second requirement! The .NET CLR as well as the mono runtime do offer several options to utilize computing resources better than other virtual machines.

## Ok… but where are the plots?

In the next part I will give the results of speed measure of an every day algorithm: the k-means clustering algorithm. I have seen ‘comparisons’ on the web, where a library, utilizing the MKL compares its own matrix multiplication (carried out in the MKL) with the one from the MKL!?! Dont know, who is going to be fooled that way. We will not compare anything not executed within the virtual machine. No matrix multiplication, no linear algebra. Those parts do use optimized tools and hence run at highest possible speed anyway. K-means on the other side does only utilize array features: distance computations, binary operators, aggregate function (sum, min) etc. – all carried out within the corresponding framework.

We did choose the most popular representatives of scientific computing languages: Matlab, FORTRAN, numpy and ILNumerics and wrote the kmeans algorithm for each language twice: one time while having the best fairness in between the languages in mind and one time with major optimizations the specific language provides. Results coming in the next part…

## Comparison .NET JAVA C++ and FORTRAN

I recently stumbled on a plot provided by indeed.com:

I found it interesting that .NET even seems to be leading in the past and only recently Java had catched up with .NET. Somehow, I always thought it was the other way around. Also, Fortran does not seem to be visible at all. Most likely, because you wouldn’t enter any position because “you know Fortran”. Rather Fortran is somehow a requirement which you would need to learn in case your position requires it, I suppose.

## Why the ‘var’ keyword is not allowed in ILNumerics

One of the rules related to the new ILNumerics memory management, which potentially causes issues, is not to use the var keyword of C#. In Visual Basic, similar functionality is available by ommiting the explicit type declarations for array types.

ILArray<double> A = rand(5,4,3); // explicit array type declaration required
var B = rand(5,4,3); // NOT ALLOWED!

Lets take a look at the reasons, why this – otherwise convenient – language feature is not allowed in conjunction with array declarations. In order to make this clear, a little survey into the internals of the ILNumerics memory management is needed.

In ILNumerics, local arrays are of one of the types ILArray<T>, ILLogicalArray, ILCell. Those array types are one building block of the memory managemt in ILNumerics. By using one of those array types in a function, one can be sure, to keep the array alive and available as long as the function is not left. On the other hand – as soon as the function was left, the array will be recycled immediately.

Other array types exist in ILNumerics. They serve different purposes regarding their lifetime and mutability. Input arrays like ILInArray<T> f.e., make sure, arrays given as function parameters are unable to get altered.

Another important type is the return type of any function. Every function, property or operator in ILNumerics returns arrays of either ILRetArray<T>, ILRetLogical or ILRetCell. Return arrays are volatile or temporary arrays. Their use is restricted to exactly one time. After the first use, return arrays are disposed immediately. For expressions like the following, this behavior drastically increases memory efficiency:

abs(pow(cos(A*pi/2+t),2)

Assuming A to be a (rather large) array, 7 temporary memory storages of the size of A would be necessary in order to evaluate the whole expression. But if we take the above assumption regarding the lifetime of return arrays into account, that number is reduced to at most 2 temporary arrays. The reason: A * pi needs one storage for the result: Result1. It is than used to compute [Result1] / 2. Here, another storage is needed for the new result: Result2. At the end of the division operation, [Result1] has already been used for the first time. Since it is a Return Type, it is released and its storage recycled. For the next calculation [Result2] + t, the storage from [Result1] is already found in the memory pool of ILNumerics. Therefore, no new storage is needed and both temporary storages are alternatingly used for the subsequent evaluations.

Lets assume, the expression above does only make sense, if we can retrieve the result and use it in subsequent expressions inside our function. The most common case would be to assign the result to a local variable. Now, we get close to the interesting part: If we would allow the var keyword to be used, C# would generate a local variable B of type ILRetArray<double>:

// DO NOT DO THIS!!
var B = abs(pow(cos(A*pi/2+t),2);    // now B is of type ILRetArray<double> !

Console.Out.Write(B.Length);
Console.Out.Write(B.ToString());    //<-- fails! 

Besides the fact, that this would conflict with the function rules of ILNumerics (local array types must be of ILArray<T> or similar), B could only be used for exactly one time! In the example, B.Length does execute normaly. After that statement, B gets disposed. Therefore, the statement B.ToString() will already fail. This is, why var is not permitted.

By explicitely declaring the local array type, the compiler will use implicit type conversions in order to convert a return array to a local array, which is than available for the rest of the function block:

// this code is correct
ILArray<double> B = abs(pow(cos(A*pi/2+t),2);    // now B is of type ILArray<double> !

Console.Out.Write(B.Length);
Console.Out.Write(B.ToString());    // works as expected

See: General Rules for ILNumerics, Function Rules