Tag Archives: variance

Quick and dirty tests and misleading results

Today this blog post popped up on my desktop. It describes a quick attempt to outperform the variance function of our friend library MathNet.Numerics by utilizing the Task Parallel Library from within F#. It obviously worked out – by a factor of 10! Since this seemed strange to me, I couldn’t resist to make my own (very quick and dirty) comparison.

The code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ILNumerics; 
using MathNet; 
using MathNet.Numerics.Statistics; 
using System.Diagnostics; 

namespace ConsoleApplication6 {
    class Program : ILMath {
        static void Main(string[] args) {
            long ms = 0; 
            int n = 10000000; 
            int it = 100; 

            ILArray<double> A = randn(1,n);
            Stopwatch sw = new Stopwatch(); 
            for (int i = 0; i < it; i++) {
                sw.Restart(); 
                ILArray<double> B = var(A);
                sw.Stop(); 
                ms += sw.ElapsedMilliseconds; 
            }
            Console.WriteLine("ILNumerics needed: {0} ms",ms/(float)it); 
            
            double[] storage = A.ToArray();
            ms = 0; 
            for (int i = 0; i < it; i++) {
                sw.Restart();
                double a = Statistics.Variance(storage); 
                sw.Stop();
                ms += sw.ElapsedMilliseconds;
            }
            Console.WriteLine("Mathnet.Numerics needed: {0} ms", ms / (float)it); 
            Console.ReadKey(); 
        }
    }
}

The result:

ILNumerics needed: 101,87 ms
Mathnet.Numerics needed: 433,67 ms

So the performance of the MathNet implementation is actually not too bad. ILNumerics does parallelize the var() function (the test run on a dual core) and uses unsafe pointer optimized code for the iteration. So a factor of 4 over MathNet is reasonable. I suppose, the speedup of 10 in the referenced F# blog post is more a measure of memory bandwith. The test there have not been repeatedly run and so it appears, the given implementation is advantageous in terms of memory accesses. Possibly some prefetch of cachelines or similar is going on.