John D. Cook explains on his blog how he likes writing actual mathematical applications the best. He favors python and scipy – but for the same reasons why ASP.NET and C# developers favor ILNumerics. Both approaches take a general purpose programming language and extend it with a mathematical library so one can do mathematical programming in a general purpose language.
Todays release of ILNumerics is a minor bugfix release. It ensures the var() function to be memory efficient even in certain optimization scenarios. Also, if you were using ILNumerics in conjunction with a free Trial License you may have encountered exceptions which not always made clear, what the problem was. They were related to the regular ILInvalidLicenseException and thus expected behavior. Nevertheless, it should be easier now to track those exceptions down.
Some users experienced problems regarding our Evaluation Licenses recently. These should be fixed now. In case you as well had problems getting the Evaluation License to run: please try again! Everything should run smoothly now.
As always: in case of problems, our forum is here to help you out fast.
The project aims at computations on very large distributed data sets and is intended for Azure. Interesting news for me: the library shows quite some similarities to ILNumerics. It provides array classes on top of native wrappers, utilizing MPI, PBLAS and ScaLAPACK. A runtime is deployed with the project binaries: Microsoft.Numerics, which provides all the classes described here.
‘Local’ Arrays in “Cloud Numerics”
The similarity is most obvious when comparing the array implementations: Both, ILNumerics and Cloud Numerics utilize multidimensional generic arrays. Cloud Numerics arrays all derive from Microsoft.Numerics.IArray<T> – not to be confused with ILNumerics local arrays ILArray<T> ;)!
Important properties of arrays in ILNumerics are provided by the concrete array implementation of an array A (A.Size.NumberOfElements, A.Size.NumberOfDimensions, A.Reshape, A.T for the Transpose a.s.o.). On the Cloud Numerics side, those properties are provided by the interface IArray<T>: A.NumberOfDimensions, A.NumberOfElements, A.Reshape(), A.Transpose() a.s.o).
A similar analogy is found in the element types supported by ILArray<T> and Microsoft.Numerics.IArray<T>. Both allow the regular System numeric value types, as System.Int32, System.Double and System.Single. Interestingly – both do not rely on System.Numerics.Complex as the main double precisioin complex data element type but rather implement their own for both: single precision and double precision.
Both array types support vector expansion, at least Cloud Numerics promises to do so in the next release. For now, only scalar binary operations are allowed for arrays. For an explanation of the feature it refers to NumPy rather than ILNumerics though.
Arrays Storage in “Cloud Numerics”
The similarities end when it comes to internal array storage. Both do store multidimensional arrays as one dimensional arrays internally. But Cloud Numerics stores its elements in native one dimensional arrays. They argue with the 2GB limit introduced for .NET objects and further elaborate:
Additionally, the underlying native array types in the “Cloud Numerics” runtime are sufficiently flexible that they can be easily wrapped in environments other than .NET (say Python or R) with very little additional effort.
It is hard follow that view. Out of my experience, .NET arrays are perfectly suitable for such interaction with native libraries, since at the end it is just a pointer to memory passed to thoses libs. Regarding the limit of 2GB: I assume a ‘problem size’ of more than 2GB would hardly be handled on one node. Especially a framework for distributed memory I would have expected to switch over to shared memory about at this limit at least?
In the consequence, interaction between Cloud Numerics and .NET arrays becomes somehow clumsy and – if it comes to really large datasets – with an expected performance hit (disclaimer: untested, of course).
Differences keep coming: indexing features are somehow basic in Cloud Numerics. By now, they support scalar element specification only and restrict the number of dimension specifier to be the same as the number of dimensions in the array. Therefore, subarrays seems to be impossible to work with. I will have an eye on it, if the project will support array features like A[full, end / 2 + 1] in one of the next releases
I wonder, how the memory management is done in Cloud Numerics. The library provides overloaded operators and hence faces the same problems, which have led to the sophisticated memory management in ILNumerics: if executed in tight loops, expression like
A = 0.5 * (A + A') + 0.5 * (A - A')
on ‘large’ arrays A will inevitably lead to memory pollution if run without deterministic disposal! Not to speak about (virtual) memory fragmentation and the problems introduced by heavy unmanaged resources in conjunction with .NET objects and the GC … I could not find the time to really test it live, but I am almost sure, the targeted audience with really large problem sizes somehow contradicts this approach. Unless there is some hidden mechanism in the runtime (which I doubt, because the use of ‘var’ and regular C#, hence without the option to overload the assignment operator), this could evolve to a real nuissance IMO.
This part seems straightforward. It follows the established scheme known from MPI but offers a nicer interface to the user. Also the Cloud Numerics Runtime wraps away the overhead of cluster management, array slicing to a good extend. However, the question of memory management again arises on the distributed side as well. Since the API exposed to the user (obviously?) does not take care of disposal of temporary arrays in a timely fashion, the performance for large arrays will most likely be suffering.
As soon as I find out more details about their internal memeory management I will post them here – hopefully together with some corrections of my assumptions.
I have always been a great fan of the SOS.dll debugger extensions. It provides huge help in just so many situations, where a deep look into the inner state of the CLR is needed at runtime and the common debugging tools of Visual Studio simply dont go far enough. When working on the new memory management of ILNumerics SOS many times lived up to its name and allowed the final insight needed to make it work. How I wished, the processor itself would expose similar potential to find out ‘what’s going on’ as the CLR does …!
Tess’ blogpost about new commands in SOS for CLR 4.0 therefore triggered great expectations here. So I finally managed to take a quick look onto it:
!help already seems to come up with a whole lot of much more commands than before. In the past I have been using a small subset only, mainly !GCRoot, !DumpHeap, !DumpObject and !DumpClass. Some of the newly appeared commands sound promising as well: !HeapStat, !ListNearObj, !AnalyzeOOM, !HistInit, !HistObj, !HisttObjFind, !HistRoot, !ThreadState,!FindRoots, !GCWhere and !VerifyObj.
Wow! GCWhere, FindRoots and those Hist??? command definitely deserve a closer look in one of our next programming sessions – even if the next GC issue is not very lickely to appear for ILNumerics really soon
I have not been able to find part II of Tess’ blog post, but luckily the msdn documentation is still there for those commands.
Strange: between all those reputable commands one is found called ‘!FAQ’. I tried to find out, which answers the users of the SOS might seek most intensively. But unfortunately, it didn’t work out:
The name 'FAQ' does not exist in the current context
@Update: Somehow I really have missed the fact, that nowadays, everybody seem to be using a new tool, superseding the SOS.dll: Psscor4…. Anyway, those commands will nevertheless be checked in there.
This is, how we at ILNumerics today spend our lunch breaks watching at: http://www.flixxy.com/hubble-ultra-deep-field-3d.htm
And for waking up again: http://www.flixxy.com/golf-ball-slow-motion.htm