Tag Archives: F#

LLVM everywhere

F#News today published some efforts to utilize the impressive power of the LLVM compiler suite from within F#. The attempts did not turn out to be mature nor stable yet – but it marks some potential of utilizing multi level compilation for runtime optimization: Use high level languages to formulate your algorithm and let lower level optimizations translate your algorithm into highly efficient (platform specific) code. The attempt demonstrated in the post mentioned above still does not sufficiently hide the internals of LLVM. A truely comfortable library would offer a switch to the user only: UsePlatformOptimization - on/off. It would then be the responsibility of the library to transform the high level algorithm into valuable input of the optimizing framework.

LLVM is not the only interesting target for such optimization scenario. Another target is OpenCL. However, most graphic card vendors and Intel (dont know about AMD?) rely on LLVM for their OpenCL implementations already. So it appears there is no way around LLVM …

Quick and dirty tests and misleading results

Today this blog post popped up on my desktop. It describes a quick attempt to outperform the variance function of our friend library MathNet.Numerics by utilizing the Task Parallel Library from within F#. It obviously worked out – by a factor of 10! Since this seemed strange to me, I couldn’t resist to make my own (very quick and dirty) comparison.

The code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ILNumerics; 
using MathNet; 
using MathNet.Numerics.Statistics; 
using System.Diagnostics; 

namespace ConsoleApplication6 {
    class Program : ILMath {
        static void Main(string[] args) {
            long ms = 0; 
            int n = 10000000; 
            int it = 100; 

            ILArray<double> A = randn(1,n);
            Stopwatch sw = new Stopwatch(); 
            for (int i = 0; i < it; i++) {
                sw.Restart(); 
                ILArray<double> B = var(A);
                sw.Stop(); 
                ms += sw.ElapsedMilliseconds; 
            }
            Console.WriteLine("ILNumerics needed: {0} ms",ms/(float)it); 
            
            double[] storage = A.ToArray();
            ms = 0; 
            for (int i = 0; i < it; i++) {
                sw.Restart();
                double a = Statistics.Variance(storage); 
                sw.Stop();
                ms += sw.ElapsedMilliseconds;
            }
            Console.WriteLine("Mathnet.Numerics needed: {0} ms", ms / (float)it); 
            Console.ReadKey(); 
        }
    }
}

The result:

ILNumerics needed: 101,87 ms
Mathnet.Numerics needed: 433,67 ms

So the performance of the MathNet implementation is actually not too bad. ILNumerics does parallelize the var() function (the test run on a dual core) and uses unsafe pointer optimized code for the iteration. So a factor of 4 over MathNet is reasonable. I suppose, the speedup of 10 in the referenced F# blog post is more a measure of memory bandwith. The test there have not been repeatedly run and so it appears, the given implementation is advantageous in terms of memory accesses. Possibly some prefetch of cachelines or similar is going on.