First Look at Julia on Windows

I recently blogged about the upcoming Lang.NEXT 2012 conference in Redmond. And since the videos are not uploaded yet (and the talk about Julia should pretty much only soon start) I decided to use the time to do some early evaluation of the language with the beautiful suggestive name everyone seems to fall in love immediately. Since we all know how prone love is to projection, I felt I needed a more rational look on the language. And – as usual – as things get clearer you get to know each other more and more and butterflies turn into even more beautiful butterflies … or into something completely different ….

Lets start with some motivation. Julia wants to bridge the gap between established convenient mathematical (prototyping, desktop) systems and high performance computing (parallel) resources. So, basically, it wants to be comfortable and fast. “Huh?” – I hear you say, “this is what ILNumerics does as well!” – and of course you are right. But Julia originates from a very different motivation as ILNumerics. For us, the goal is to provide convenient numeric capabilities with high performance and a comfortable syntax – but to do it directly in a general purpose language. Basically, this brings a lot of advantages when it comes to deployment of your algorithm and it is much easier to utilize all those convenient development tools which are already there for C#. Furthermore, (frequent) transition from business logic to your numerical algorithms can become nasty and error prone.

Julia, on the other side, has to fight other enemies: dynamic language design. Things like dispatching schemes, type inference and – promotion, lexer and parser and certainly a lot more. I really bow to those guys! From a first view they did really succeed. And at the same time, I am glad, that Eric Lippert and his colleagues took away the hard stuff from us. But, of course: by going through all that pain of language design (ok, it sometimes might be fun as well) – you gain the opportunity to optimize your syntax to far less limits. A ‘plus’ of convenience.

Lets take a look at some code. Readers of this blog are already familiar with what turns out to become our favorite algorithm for comparing languages: the kmeans algorithm in its beauty and simplicity. Here comes the Julia version I managed to run on Windows:

function kmeansclust (X, k, maxIterations)

nan_ = 0.0 / 0.0;
n = size(X,2); 
classes = zeros(Int32,1,n); 
centers = rand(size(X,1),k); 
oldCenters = copy(centers); 
while (maxIterations > 0)
        println("iterations left: $maxIterations"); 
        maxIterations = maxIterations - 1;
        for i = 1:n
                Xexp = repmat(X[:,i],1,k);
            	dists = sum(abs(centers - Xexp),1); 
	        classes[i] = find(min(dists) == dists)[1];
        end
        for i = 1:k
            inClass = X[:,find(classes == i)];
            if (isempty(inClass))
                centers[:,i] = nan_;
            else
                centers[:,i] = mean(inClass,2);
            end
        end
        if (all(oldCenters == centers))
            break;
        end
        oldCenters = copy(centers);
 end
 (centers, classes)
end

Did you notice any differences to the Matlab version? They are subtle:

  • Line 29 returns the result as tuple – a return keyword is not required. Moreover, what is returned does not need to be defined in the function definition.
  • Julia implements reference semantics on arrays. This makes the copy() function necessary for assignments on full arrays (-> lines 7 and 27). For function calls this implies, that the function potentially alters its input! Julia states the convention to add a ! to the name of any function, which alters its input parameter.

Besides that, the syntax of Julia can be pretty much compatible to MATLAB® – which is really impressive IMO. Under the hood, Julia even offers much more than MATLAB® scripts are able to do: type inference and multiple dispatch, comprehensions, closures and nifty string features like variable expansion within string constants, as known from php. Julia utilizes the LLVM compiler suite for JIT compilation.

Julia is too young to judge, really. I personally find reference semantics for arrays somehow confusing. But numpy does it as well and nevertheless found a reasonable number of users.

While the above code run after some fine tuning, the current shape of the Windows prebuilt binaries somehow prevented a deeper look in terms of performance. It still needs some quirks and bugs removed. (The Windows version was provided only some hours earlier and it was the first publicly available version for Windows at all.) As soon as a more stable version comes out, I will provide some numbers – possibly with an optimized version (@bsxfun is not implemented yet which renders every comparison unfair). According to their own benchmarks, I would expect Julia to run around the speed of ILNumerics.

Lang.NEXT 2012

I know, it is almost obligatory to show overwhelming excitement on every upcoming ‘trend technology’ conference. But this time its not a play! The Lang.NEXT 2012 in Redmond exhibits a truely interesting list of speakers and projects. One being Julia (I talked about ‘her’ in my last post). Others are:

  • IKVM.NET – enabling Java applications to run on .NET
  • Roslyn – a more than promising approach to expose an API to the C# and VB compiler services, making them more attractive for runtime utilization and
  • Dart – ‘A Well Structured Web Programming Language’ by Google

But there are also talks about C++11 and ECMAScript 6 and … Just too many to get to know really. ‘Luckily’ I never really felt excitement for functional related stuff. So its easier for me to concentrate on the ‘rest’. but its not fair. Those F# projects do deserve your attention as well. And of course D will be online on the Lang.NEXT also.

Lang.NEXT 2012 starts tomorrow. I will definitely spend some time on the recordings and eventually report back here as well.

Julia, Math .NET M#, FORTRAN .NET, managed LAPACK, MKL and outlook

With the recent advances in the ILNumerics core module we were able to improve the computational part of our libraries a lot. Not only was the execution speed increased by magnitudes – while catching up with C++ and FORTRAN the .NET platform gets more attracting to an even wider community of scientists, engineers and programmers of numerical applications.

We find ourself as part of a very exciting evolution. A whole bunch of young and not so young projects are targeting similar goals like ILNumerics: convenience and performance. One interesting among them is the Julia language. A language, very similar to the MATLAB syntax (hence to ILNumerics’ syntax as well) is combined with a JIT compiler from the LLVM suite (what else?). While the convenience of the language is out of question the speed provided by the LLVM JIT is “in the range of 2x C++”. The language is dynamic which marks an important difference to ILNumerics.

Interestingly enough, one of the developers of Julia have been involved into the creation of M# (according to this blog post):

Jeff [Bezanson] was a principal developer of M#, an implementation of the MATLAB language running on .NET

And this is where it starts getting even more interesting. Consider, having a compiler
for ‘ILM#’ (an imaginary extension of Julia/MATLAB with typesafety), outputting .NET IL code and at the same time incorporating the deterministic disposal patterns of ILNumerics! However, I have not been able to find any working MATLAB-to-.NET compiler yet and no M# project either. Anyone out there knowing where it lives today?

The idea of being able to convert complete MATLAB code branches into ILNumerics libraries, making them run at the speed of C/FORTRAN is very appealing indeed. And there is another potential language as conversion source: FORTRAN itself! While a lot of developers value the platform independence and convenience of C# over FORTRAN (especially if it comes to GUI development or even RAD) – they argueably will not love the idea of rewriting all their grown-over-the-years FORTRAN algorithms again in ILNumerics. Having the option to automatically convert that code into C#/ILNumerics would not only save them from PInvoking into native FORTRAN libraries, but even make that code run on all platforms supported by .NET.

Having this in mind, I recently did some searching for matching projects. The two attempts I found:

  • Lahey Fujitsu, LF .NET Fortran compiler. Seems to be discontinued?
  • Silverfrost FTN95: Fortran 95 for Windows

I did some tests with FTN95. With some help of Paul Laider from Salford I have been able to create a ‘fully managed’ LAPACK version right from the netlib sources with only very minor modifications to the official FORTRAN code. I say ‘fully managed’ because at the end, you’ll get a real .NET assembly. However, the compiler comes with some drawback IMO which I will wirte about in a later post.

However, this brings us further to one of our goals (and to the last CAPITALIZED buzzword from our headline): not having to rely on MKL anymore. Since we have been able to speedup the matrix multiplication to around half the speed of the MKL, having all the LAPACK stuff within C# marks a next milestone. when all is finished, the user will have the option to choose from these deployment schemes:

  • ILNumerics fully managed version. Suitable for Silverlight, Office Addons, Visual Studio Plugins etc., <8 MB, all platforms supported, no native libs
  • ILNumerics 32 or 64 bit, with native support, platform specific, around 2 times faster, considerably larger binaries

And this is still without potential improvements on the “half the speed of MKL” issue …

As always: any comments welcome.

ILNumerics Version 2.11 released

Todays release of ILNumerics is a minor bugfix release. It ensures the var() function to be memory efficient even in certain optimization scenarios. Also, if you were using ILNumerics in conjunction with a free Trial License you may have encountered exceptions which not always made clear, what the problem was. They were related to the regular ILInvalidLicenseException and thus expected behavior. Nevertheless, it should be easier now to track those exceptions down.

Some users experienced problems regarding our Evaluation Licenses recently. These should be fixed now. In case you as well had problems getting the Evaluation License to run: please try again! Everything should run smoothly now.

As always: in case of problems, our forum is here to help you out fast.

LLVM everywhere

F#News today published some efforts to utilize the impressive power of the LLVM compiler suite from within F#. The attempts did not turn out to be mature nor stable yet – but it marks some potential of utilizing multi level compilation for runtime optimization: Use high level languages to formulate your algorithm and let lower level optimizations translate your algorithm into highly efficient (platform specific) code. The attempt demonstrated in the post mentioned above still does not sufficiently hide the internals of LLVM. A truely comfortable library would offer a switch to the user only: UsePlatformOptimization - on/off. It would then be the responsibility of the library to transform the high level algorithm into valuable input of the optimizing framework.

LLVM is not the only interesting target for such optimization scenario. Another target is OpenCL. However, most graphic card vendors and Intel (dont know about AMD?) rely on LLVM for their OpenCL implementations already. So it appears there is no way around LLVM …

Quick and dirty tests and misleading results

Today this blog post popped up on my desktop. It describes a quick attempt to outperform the variance function of our friend library MathNet.Numerics by utilizing the Task Parallel Library from within F#. It obviously worked out – by a factor of 10! Since this seemed strange to me, I couldn’t resist to make my own (very quick and dirty) comparison.

The code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ILNumerics; 
using MathNet; 
using MathNet.Numerics.Statistics; 
using System.Diagnostics; 

namespace ConsoleApplication6 {
    class Program : ILMath {
        static void Main(string[] args) {
            long ms = 0; 
            int n = 10000000; 
            int it = 100; 

            ILArray<double> A = randn(1,n);
            Stopwatch sw = new Stopwatch(); 
            for (int i = 0; i < it; i++) {
                sw.Restart(); 
                ILArray<double> B = var(A);
                sw.Stop(); 
                ms += sw.ElapsedMilliseconds; 
            }
            Console.WriteLine("ILNumerics needed: {0} ms",ms/(float)it); 
            
            double[] storage = A.ToArray();
            ms = 0; 
            for (int i = 0; i < it; i++) {
                sw.Restart();
                double a = Statistics.Variance(storage); 
                sw.Stop();
                ms += sw.ElapsedMilliseconds;
            }
            Console.WriteLine("Mathnet.Numerics needed: {0} ms", ms / (float)it); 
            Console.ReadKey(); 
        }
    }
}

The result:

ILNumerics needed: 101,87 ms
Mathnet.Numerics needed: 433,67 ms

So the performance of the MathNet implementation is actually not too bad. ILNumerics does parallelize the var() function (the test run on a dual core) and uses unsafe pointer optimized code for the iteration. So a factor of 4 over MathNet is reasonable. I suppose, the speedup of 10 in the referenced F# blog post is more a measure of memory bandwith. The test there have not been repeatedly run and so it appears, the given implementation is advantageous in terms of memory accesses. Possibly some prefetch of cachelines or similar is going on.

The Productivity Machine | A fresh attempt for scientific computing | http://ilnumerics.net