ILNumerics: High Performance as Default

ILNumerics releases Version 7.0

The ILNumerics team is proud and excited to announce the availability of a new major release of its ILNumerics Ultimate VS product line. This milestone represents a significant advancement in compiler technology: it introduces the first autonomously parallelizing array compiler, ILNumerics Accelerator.

Array codes, such as numpy and Matlab, are popular within the scientific community. They simplify the description of complex numerical algorithms. However, when these numerical codes are implemented into industrial-grade products, distributed to customers, and maintained by a larger development team for many years, existing tools reveal considerable limitations, with execution speed being possibly the greatest limitation.

ILNumerics Accelerator is here to resolve this issue once and for all. Built on the convenient language of ILNumerics Computing Engine for authoring scientific array codes directly in C# /.NET, our intelligent Accelerator compiler reformulates array codes and executes them in the most efficient manner, unguided and autonomously.

Short Dive: Building an Array Compiler …

The key to optimal hardware utilization today is, of course, the ability to efficiently parallelize the workload. Therefore, ILNumerics Accelerator compiler automatically detects and utilizes parallel potential within subsequent array instructions of your algorithm.

The prevalent approach to automatic parallelization today involves analyzing your code to detect instructions that can execute in parallel. The code is then rewritten to allow these independent instructions to run concurrently. This approach originated at a time when C/C++ and FORTRAN were considered ‘high-level’ languages. While these languages offer some basic array support, numerical algorithms written in them inherently deal with scalar data.

In contrast, the algorithms we handle use n-dimensional arrays as the fundamental data type. When striving for real efficiency, the arrays’ complexity prohibits any compile-time decision about data independence, a suitable execution device, or a low-level implementation using vector instructions, among other factors. Too many variables determine the optimal execution strategy, and their variation is too extensive to identify at development time.

… with a fresh Approach to Parallelization …

ILNumerics takes a data-driven approach to parallelization, making all crucial decisions ‘just in time’ at runtime. This method enables us to implement the low-level internals of array instructions efficiently and eliminates the need for expensive dependency analysis of large program segments. It’s easier to determine at runtime whether an instruction relies on other data and when that data will be available.

However, this increases the compiler’s complexity significantly. Instead of producing fixed code for an array instruction found in the original program, we now generate an intermediate program, often representing several subsequent array instructions. This intermediate program is then transformed at runtime to reflect the original instructions in a manner that respects all influencing factors and enables optimal efficiency.

… with Intelligence and Freedom

These programs all run asynchronously, scheduling themselves onto suitable devices for execution at runtime. They connect with previous and subsequent asynchronous neighbors on the fly, even as they progress in calculating results. They autonomously determine the optimal implementation on the device they select. Significant decision-making power comes with immense responsibility. The final program code builds up asynchronously and autonomously at runtime.

Executing the same code branch twice may result in very different machine instructions. As the creators of the compiler, we can’t predict many details of the final implementation. We can’t even forecast the actual number of asynchronous programs, which we refer to as segments, executing concurrently. The decision now rests with the computer. It has a better understanding of the situation than we do and can process information and make educated decisions much faster and more effectively.

Why another Beta, then?

The first autonomous array compiler has arrived. Are you eager to try it out? Great! Go ahead and test the current beta from nuget.org. Its performance is genuinely impressive.

The majority of our users are building industrial products and have chosen ILNumerics for its reliability and seamless integration into .NET (which means all of .NET, since 2005). The compiler will become a regular part of the ILNumerics Computing Engine once it’s ready. Currently, it’s like a young child, learning new things each day. It’s learning to walk and talk, occasionally stumbling over previously unknown obstacles. Sometimes, it even surprises its creators with unexpected efficiency.

With recent advancements, the Accelerator compiler is now set to ‘always on’ mode. This marks a significant milestone in the development of a compiler. It’s now trusted to be stable and flexible enough to handle all codes, even unseen ones, ensuring it produces correct results and serves its primary purpose: to enhance efficiency.

The compiler will continue to develop until the final release. It will become more stable and gain more real-world exposure. We would greatly appreciate if your feedback contributes to its maturity by then.

All previous modules of ILNumerics have been adapted for the new Accelerator. They are thoroughly tested, have undergone many bugfixes and improvements, and are available for production on nuget.org as of today.


 

Introducing the ILNumerics Accelerator:

[ Video does not show? Download here: https://ilnumerics.net/media/andere/ILNumerics_Segments_2022.mp4 ]

ILNumerics Accelerator – A better Approach to faster Array Codes, Part II

An better Approach to an unsolved Problem

TLDR: Authoring fast numerical array codes on .NET is a complex task. ILNumerics Computing Engine simplifies this task: it brings a convenient syntax, which is fully compatible to numpy and Matlab. And today, we bring back the free lunch: our brand new JIT compiler not only transforms numerical algorithms into highly efficient codes, it also automatically adopts to and parallelizes your workload on any hardware found at runtime. It segments and distributes even small workloads efficiently – more fine grained than any manual configuration could do.

Author: H. Kutschbach (ILNumerics)  Reading time: 12 min

Since its introduction in 2007 ILNumerics has led innovation for technical computing in the .NET world. We have enabled a short, expressive syntax for authors of numerical algorithms. Writing array based algorithms with ILNumerics feels very similar (and is compatible) to well known prototyping systems, namely: Matlab, numpy and all its successors.

Another focus of ILNumerics has always been: performance. In fact, we started the company in 2013 with the goal to build the fastest technology on earth. Today we propose a better, faster approach to array code execution: ILNumerics Accelerator.

This is part 2 of an article series on ILNumerics Accelerator. In the first part we’ve explained why automatic parallelization was too large a problem for the existing compiler landscape. In the following we start by describing the problem ILNumerics Accelerator solves. Then we’ll explain how it works.

The Problem Space

Math drives the world. High tech companies, financial institutions, medical instruments, rocket science … The most innovative projects do all have one thing in common: they are driven by numerical data and complex mathematical algorithms working on these data. Most data can be represented as n-dimensional arrays. They allow to implement great complexity without great effort. Consequently, array based algorithms play a huge role in todays most innovative industries.

Over the years we have seen many different attempts to realize the advantage of numerical arrays for small and large developer teams, working on .NET. Some large teams have even built their own solution. And indeed: building a math library in C# is easy to start with! You’ll see many low hanging fruits. With a few lines of code, you are able to create matrices and to sum them in short expressions, like ‘A + B’! Woooh!! If this is all you need then you can safely skip the rest of this article…

Often enough managers loose enthusiasm when after some months huge device codes built with such trivial ‘solution’ eventually show execution speed far below their requirements.

Optimal execution speed is only realized by efficient utilization of all compute resources.

What may sound simple is actually a problem that has been waiting for a solution for more than two decades. Todays hardware is mostly heterogeneous. Even inside a single CPU there are at least two levels of parallelism. Often, there is also at least one GPU around.

The only working way to utilize heterogeneous computing resources today was to manually distribute manually selected parts of an algorithm and its data to manually selected devices. Many tools exist which aim to help the programming expert to decide, write and fine-tune the required codes at development time. This still requires expert knowledge and intimate insights into the data flow of your algorithms. And it requires time. A lot of time! The large codes created this way need testing and maintenance.

Nevertheless, this demanding approach still cannot harness all performance potential! Needless to say that such results are far from being expressive and easy to understand. ILNumerics Accelerator is here to solve this problem for the large domain of numerical array codes.

The Big Performance Picture

Let’s widen our perspective on the performance topic! Three major factors influence execution speed:

1. The Algorithm

Basically, a program performs a transformation of data from one representation (input) into another representation (result). The transformation is described by the program´s algorithm(s). This high-level, business view on software is ‘lowered’ by the programmer when she writes the concrete program code, enabling a compiler to understand the intent. In subsequent compilation steps the code is then further lowered by one or more compilers into other representations, which can be better understood by the hardware. So, the programmer and the compiler(s) share responsibility to implement the intended transformation in a way that it is carried out on the available hardware and produces the correct result in the shortest time possible.

2. The Hardware

An optimally fast program must recognize and utilize all available hardware resources. The exact configuration and properties of all devices, however, is typically only known at runtime. This implies that a significant amount of transformation can only be done at runtime (-> JIT compiler). One may say, a program could be specialized for one specific set of computing resources. This is of course true, but it only shifts issues to the next time the hardware is to be renewed. Further, some properties of the hardware are inherently only known at runtime. Two examples: resource utilization rate and influences by other programs / threads running concurrently on the computer. If the final executable program fails to keep all parts of the hardware busy with useful instructions the program will require more time to run than necessary.

3. The Data

In general technical data is made out of individual numbers, or ‘scalars’. Traditionally, processor instructions deal with scalar data only. In array algorithms data forms arrays with arbitrary shape, size and dimensionality. This can be seen as an extension to the traditional scalar data model, since n-dimensional arrays include scalar data as a special case.

The array extension introduces great new complexity. Instead of a single number, having a type and a value, an array instruction deals with input data which may have one or many, often several million individual values! But arrays can also be empty…

While there is often exactly one way to execute a scalar instruction, to execute an array instruction efficiently requires careful plan and order and to select and align the execution strategy to the (greatly varying) data and hardware properties. Again, failing to do so (at runtime) will lead to sub-optimal execution times.

Workload & Segmentation

Together with the algorithm data forms another important factor: ‘workload’. It is a measure of the number of computational steps required to perform a certain data transform. Let’s say: the number of clock cycles of a CPU core required to compute the result of the ‘sin(A)’ instruction for a matrix A of a certain size. Mapping the workload onto a concrete hardware device yields the effort or cost (in terms of minimal time or energy) required to complete the transform on this device.

Obviously, throughout a program´s execution there are many intermediate steps, transforming various data with varying properties (-sizes, etc.). The overall workload – in a slightly simplified view – is the aggregation of all workloads of all intermediate steps.

To keep a program´s execution time small, all intermediate steps must be carefully adopted to and execute on the fastest hardware resource(s) currently available, yielding the lowest cost. Compare this to how a programmer today manually selects a certain device for (manually selected) large, predefined parts of a program at development time!

Manual segmentation and device resource selection cannot bring optimal performance for all parts of a program.

Array Codes – done (& run) right

The considerations above should make it obvious that no static compiler is able to create optimal execution performance (letting toy examples aside). Data and hardware properties are subject to change! So does the optimal strategy for lowering the individual parts of a user algorithm to hardware instructions! Lowest execution times will only be achieved, if all parallel potential of an algorithm is identified and used to execute its instructions concurrently, on all available hardware resources, and efficiently.

But let’s skip all reasoning and jump right into the interesting part! Here is how ILNumerics Accelerator achieves exactly this goal:

  1. Array instructions within the user program are identified and merged into ‘segments‘.
  2. Segments limits are optimized to hold chunks of suitable workload whose size can be quickly determined at runtime.
  3. Before executing a segment …
    1. The cost of the segment with the current data is computed for each hardware resource.
    2. The best (i.e.: the fastest) hardware resource is selected.
    3. The segments array instructions are optimized for the selected resource and current data.
  4. The kernel is scheduled on the selected resource for asynchronous execution and cached for later reuse.

Likely, this list requires some explanation.

[ Video does not show? Download here: https://ilnumerics.net/media/andere/ILNumerics_Segments_2022.mp4 ]

ILNumerics Accelerator takes a new approach to efficient array code execution. Instead of attempting global analysis (the top-down approach which failed in so many projects) we optimize the program from the bottom-up. Our compiler recognizes all fundamental array instructions and expressions thereof, merges them into ‘islands’ (segments) of well-known semantics and execution cost. Each segment contains a small JIT compiler, specialized to quickly compile the segments instructions at runtime into highly optimized low-level codes – for the .NET CLR and for any OpenCL device! This step recognizes and adopts to all hardware and data properties found at runtime. And it maintains compatibility with all .NET platforms.

Currently, we support unary, binary and reduction array instructions provided by ILNumerics Computing Engine and all expressions made thereof. No changes or annotations are required to existing codes. The ILNumerics language supports all features of the Matlab and numpy languages. It will be possible to build connectors to other languages and to custom array libraries exposing similar semantics.

During build each array expression is automatically replaced from the user code files with a call to a specialized, auto-generated segment. Next to hosting a JIT for the array instructions a segment also ‘knows’ the workload of its inherent computations. When at runtime the segment is called, it uses this info to compute the cost for executing the segment on each device and to identify the device suggesting earliest completion. Afterwards, it enqueues the optimized kernel to this device for computation.

It then – without waiting for a result – immediately passes on control to the next segment in the algorithm. Now, if subsequent segments do not depend on each other they are scheduled onto different threads or devices and execute in parallel!

All technical details of the new method are found in the patents and patent applications listed at the end of this article.

Advantages

ILNumerics Accelerator automatically executes your algorithm in the fastest possible way:

  • During execution it finds chunks of workload which can be computed concurrently.
  • It identifies the best suited hardware device available at this time.
  • For each segment it adopts the execution strategy to the fastest device, and
  • Computes the workload of many segments in parallel.

Therefore, it scales execution times with the hardware without recompilation or manual code adjustments.

Our compiler …

  • Supports vector registers, multiple CPU cores, and all OpenCL devices.
  • Removes temporary arrays and manages memory between devices transparently.
  • Applies many new and important optimizations, on kernel and on segment level.
  • Fully managed solution: it targets the .NET CLR, thus is compatible with all .NET platforms.

… which brings many benefits:

  • Efficient use of all compute resources.
  • No expert effort required.
  • Shorter time to market.
  • Adjusts to new hardware.
  • Maintains maintainability.

“How much faster is it”?

This is a very popular question! A comprehensive answer deserves its own article, really.

In short: it depends on your hardware. It depends on your algorithm. And it depends on your data! The question should rather be: how much more efficiently will it utilize my hardware? For the majority of real-world algorithms is true: as efficient as possible! Moreover, it does so automatically!

(If you are still longing for a number: on 2015-ish hardware we see speed increase rates between 3 and more than 300. Most of the time optimized code runs faster by at least a magnitude.)

Where can I get it ?

ILNumerics Accelerator has entered the public beta phase. It will be released with ILNumerics Ultimate VS version 7. The pre-release is available on nuget. Start here ! General documentation on the new Accelerator is found here. It will subsequently be completed within the next weeks.

Patents

ILNumerics software is protected by international patents and patent applications. WO2018197695A1, US11144348B2, JP2020518881A, EP3443458A1, DE102017109239A1, CN110383247A, DE102011119404.9, EP22156804. ILNumerics is a registered trademark of ILNumerics GmbH, Berlin, Germany.

ILNumerics Accelerator – A better Approach to faster Array Codes, Part I

A Truth most Programmers won’t tell.

TLDR: It was twenty years ago when computer manufacturers, while trying to boost CPU performance, were confronted with the hard wall imposed by physical limitations. It took a long time for everyone involved to realize that individual processors could no longer become significantly faster in the future. The previous approach made it too simple to deal with the constantly growing demands caused by ever larger data and ever more complex programs. Haymo Kutschbach, founder and CEO of ILNumerics explains the challenges involved for automatic parallelization on today’s hardware and why traditional compilers will not satisfy the strong demand for a feasible solution. A better, working approach is presented.

Reading time: 10 minutes

For many decades, the rule applied: “Your program is too slow? Simply buy the latest hardware!” This was made possible by the availability of (general purpose) programming languages, as well as a stream of innovations in processor architectures and the associated compilers. They allowed the programs to be written largely independently of the execution hardware. The compiler adapted abstract codes to the low-level features of the processors. Everything fit together so conveniently!

And then nature threw a spanner in the works on this easy calculation. We’ve seen new computers continueing to obey Moore’s Law and become more powerful with each generation. But this new computing power is now distributed over many processors! A program can only use the new speed if it can use all processors at the same time. In addition, a significant part of the computing power of today’s computers is found in heterogeneous architectures such as GPUs and other accelerator hardware.

The consequences can be observed today in probably every development department: Programs are written abstractly. Efforts are made to create ‘clean code’ that can be tested and maintained. Unfortunately, it is often executed too slowly, though! So you start to examine the codes for optimization potential. You identify individual bottlenecks, decide manually on a parallelization strategy, implement, test and measure again. Many programmers don’t see this as a problem – their expert knowledge guarantees them a good income for many years to come! Since there are no alternatives, the enormous delays in time to market are accepted. Just like the fact that the next generation of devices will again require a complete rewrite for large parts of the software.

Hardware today is far more diverse and heterogeneous than it was 20 years ago. But why is that actually a problem? Why are we still not able to automatically adapt and run our programs on such hardware again?

If it’s too slow, blame the Compiler!

For one, it’s because the compiler market has traditionally been dominated by processor manufacturers. Their compilers served to make their own hardware more accessible. Even though we’ve seen a recent trend towards vendor-independent compilers, they still continue the traditional approach: compiling (“lowering”) code of a general purpose programming language (like C or Fortran) to hardware instructions that can be executed on a specific (and previously selected) processor architecture. What we actually need, however, is a compiler that translates an abstract program for a “computer” – with all its diverse computing resources.

This task, however, is much more demanding! Why? Here we come to the second reason that has so far prevented a working solution: granularity. In order to be able to use parallel resources, the software must contain additional instructions that take care of the distribution of the individual program parts. This additional overhead makes a certain minimum program size (workload) mandatory. Parallelization only makes sense if the gain in speed through parallel execution exceeds the additional management effort!

Here, the term “granularity” refers to the number of individual low -level instructions that a piece of code requires to calculate its result. Apparently, the granularity increases with the size or complexity of a program part, because more low-level instructions are executed at the end. So, in order to make efficient use of parallelism, it is not enough to consider individual instructions of a program. Rather, a compiler must merge larger program parts (‘grains’) and distribute them to the available computing resources . One of the most important challenges of a parallelizing compiler is to find the optimal size of these “chunks” – and thus the optimal granularity.

This brings us to the third reason: complexity. In general, programs can become arbitrarily complex. Compilers have the task of translating programs into another, useful form. But how does one translate information that can be arbitrarily complex? Traditional compilers achieve this goal by limiting their consideration to individual instructions in the program. They know the (manageable) set of possible translations for each of these instructions. The finished program is then little more than the chain of such individual translations. Nevertheless, traditional compilers are already considered to be extremely complex software!

Unfortunately, this “complexity reduction” trick is in direct contradiction to the necessary, minimal granularity described above. A parallelizing compiler must face the challenge of finding an optimal translation even for program chunks that consist of multiple or many instructions. As always, low hanging fruits exist (see e.g.: tensorflow). However, an optimal solution must not restrict itself to individual, specific instructions just because their translation is particularly simple and therefore easy to implement!

And then there is a fourth aspect that also plays into the area of complexity. In addition to the optimal chunk size (granularity), it is also important to identify those grains of a program that can be executed independently of one another. The program analysis required must also take into account the best possible granularity. A compiler must therefore be able to understand large program sections, form chunks of an optimal grain size and execute them efficiently in parallel on heterogeneous hardware.

Ultimately, it is the complexity of this task that has so far prevented such a compiler from being born. In general, development departments still follow the manual approach described earlier. It requires expert knowledge and enormous effort to manually perform the necessary steps (granularity and independence analysis, parallel implementation) for a special program . But despite the enormous amount of time, despite the enormous maintenance issues associated, this approach allows to scratch the surface only! It can neither address nor solve the real problem.

Are we lost?

Not yet! Let’s wrap it up:

  • Traditional compilers work at too low a level of granularity. They consider too few instructions to cover a workload, suitable for efficient parallelization.
  • Previous attempts to include large/all program parts in an analysis have only been successful in a few special cases. Global analysis for general programs is an unsolved problem and, in the opinion of the author, will remain so for a long time to come.

But now we come to the topic of our headline! For a not too small class of programs ILNumerics succeeded in developing a compiler that fulfills all optimality requirements. It is the class of numeric, array-based programs. They are written by scientists, mathematicians, and engineers to encode the innovations of our modern times. They form the core of big data, machine learning and artificial intelligence, and all codes, making use of them. Numerical algorithms realize innovations – moreover: they are these innovations! They allow to formulate great complexity without great effort. They handle huge data and, therefore, require greatest speed.

A development department, faced with the challenge of accelerating its numerical array codes can only select and use the best possible language and libraries there are. Technical requirements often rule out prototyping languages, like numpy and Matlab. Individual parts may be outsourced to GPUs. Other parts are parallelized using multithreading. All of this slows down the industrial development process significantly. It is like writing GUI programs in assembler…

A Better Approach to faster Array Codes

ILNumerics proposes a new, faster approach to the execution of array codes. ILNumerics Accelerator, automatically finds parallel potential in array-based algorithms (as: numpy, Matlab, ILNumerics) and efficiently distributes its workload to heterogeneous computing resources – dynamically and at runtime, when all important information is available. It again scales your algorithm speed to any heterogeneous hardware without recompilation.

The online documentation describes technical details of the ILNumerics Accelerator Compiler. Make sure to register for our newsletter here and don’t miss any news!

Where can I get it ?

ILNumerics Accelerator has entered the public beta phase. It will be released with ILNumerics Ultimate VS version 7. The pre-release is available on nuget. Start here ! General documentation on the new Accelerator is found here. It will subsequently be completed within the next weeks. Read the getting started guide(s), get your hands dirty and please, let us know what you think!

Patents

ILNumerics software is protected by international patents and patent applications.

DE102011119404.9, WO2018197695A1, US11144348B2, CN110383247A, JP2020518881A, EP3443458A1, EP22156804, US20230259338A1, JP7495028B2, CN116610436A, EP23173406.2, PCT/EP2024/062463

. ILNumerics is a registered trademark of ILNumerics GmbH, Berlin, Germany.

ILNumerics – what comes next?

ILNumerics Version 6 (beta) is out!

In version 6 of ILNumerics Ultimate VS we perform an important step: we split compatibility from performance. The new architecture allows us to bring the established ILNumerics API for computing and visualizations to all platforms supported by .NET. By the time of writing, a public beta of version 6 is out and available on nuget.org.

One great achievement for compatibility was to get over with the old installer. For years and with .NET Framework it was obligatory to install our libraries into the GAC. Further, ILNumerics did depend on many native DLLs.  They came in two flavours: 32bit and 64bit. The Windows way of managing them was (and still is) to store them into special system folders. And then there was our extension to be installed into Visual Studio and the registry…. All this required administrative privileges. Together with our own code, documentation and further tools ILNumerics installer added up to more than 300 MB in size.

It worked pretty well for some time. But the .NET ecosystem changed at breakneck speed. It went open-source. It cut  .NET Framework from further innovations. Packaging moved to nuget.org. Visual Studio became ‘asynch’ and changed the way to load its extensions. Some initial versions of .NET Core ultimately led into .NET 5 – to be the “single .NET” going further … It took us some time to catch up.

One problem which caused us headaches was: compatibility! With .NET Framework there was basically only one platform to support: Windows. In some cases customers used ILNumerics on Linux, too. But for years it was enough to provide a Windows – only version.  And the installer kind of protected anyone from (mis)using ILNumerics on unsupported platforms.

Now, with recent developments, .NET is not Windows – only anymore! For broadest compatibility our libraries ought to target ‘.NET Standard’. For a vendor like us this means, having to support any platform, supporting .NET! Looking at various projects delivering .NET Standard libraries with native dependencies this implication seems not to have found its way onto all developer desks, yet.

“ILNumerics now runs on .NET”

We were looking for a way to enable true compatibility for ILNumerics. A way to no longer depend on native libraries! Let’s take the MKL as an example: for 40-ish years whole teams of very clever guys implement and improve linear algebra FORTRAN routines for decomposition of matrices. These algorithms are indispensable today. They mark the state of the art and are relied upon throughout all industries. Whichever alternative one may come up with – it will most likely not show the same robustness nor precision!

So, re-implementing was no option. Hence, we wrote a compiler, able to transform FORTRAN sources into equally efficient .NET code. It took us almost a year to not only translate all of LAPACK, FFTPACK and MINPACK but also all tests (>5 Mill LOC) and have them succeed.

It paid off: instead of “running on .NET, platform XX” ILNumerics now “runs on .NET”!  Native libraries are only left for platform specific optimizations. They are not required anymore!

Release of version 6 is scheduled for August 2021. It will make your life much easier when it comes to referencing, updates and distributing your ILNumerics apps. And there is more in version 6, including bug fixes and new features. The documentation will be incrementally updated for the new version. The beta is online. It runs with your existing subscription (must refresh your license in Visual Studio) or with a regular trial license.

Where is the Array Visualizer?

The ILNumerics Visual Studio extension is split from the rest of ILNumerics Ultimate VS. It is published in the Marketplace and can be used free of charge. The Array Visualizer has seen a fresh-up and has been unlocked for all features, including visualizing debug memory of C,C++,FORTRAN and .NET arrays as well as plain pointers and arbitrary storage schemes. Further, the extension is required in order to manage and refresh your ILNumerics licenses on your developer seat. It supports Visual Studio from version 2017.

What to expect from Version 7?

Behind the scenes we have been very busy working on new innovations for unseen performance.  And this new performance will be delivered by a purely managed solution with version 7! It will not only catch up with native compiled code on multicore processors, but instead will automatically use all suitable computing hardware found on your system. It will not require you to write any GPU code. You will not even have to decide where a certain piece of code is going to be executed on! All decisions are made only when all information is available: at runtime, automatically, by the ILNumerics Accelerator (patent pending).

Are you interested in giving this new experience a try? Let us know and join us for an early alpha test: accelerator@ilnumerics.net

Visual Studio Debugging with the ILNumerics Array Visualizer: Now free for everyone!

When we released the first ILNumerics Array Visualizer back in 2012 we knew that this would become a rather useful extension to every scientists’ toolbelt. And indeed: the Array Visualizer makes developing array based algorithms so much easier! It brings instant insight into all kind of array data during your Visual Studio debug session by showing  your arrays in a number of compelling visual ways. Hence, less time has to be spent on finding and removing bugs. The extension quickly became one of those tools which no developer wants to miss anymore.

ILNumerics Array Visualizer: Debugging in VisualStudio
Our Array Visualizer in full effect

From version 5 on, the ILNumerics Array Visualizer has been available not only for C# users (with or without using ILNumerics Array<T>), but also supports programmers working with other popular languages, namely: C/C++, FORTRAN, F#, and VisualBasic. It visualizes nearly everything you have a reference for, including ILNumerics arrays, .NET arrays, plain pointers and memory addresses (.NET or C/ C++), and even FORTRAN arrays!

Because its features were so compelling we expected other companies to adopt this idea and offer similar tools. But in fact, as of today, we are not aware of any comparable visualizer extension on the market.

The Array Visualizer is aligned with our mission to support the creation of technical applications in industrial contexts. However, the core advantage of ILNumerics has always been the ability to create demanding numerical algorithms in Matlab and numpy style faster and directly in .NET – and to reach a higher level of performance during execution.

We focus on that goal and decided to release the Array Visualizer to the community. For free. We will continue maintaining and improving the Array Visualizer in the future. This may include to distribute it as a stand alone tool – outside of the ILNumerics distribution package.

For now, you can add the Array Visualizer extension to your machine for free by using our online shop. 

  1. Register for an account: https://ilnumerics.net/account.html
  2. Sign-in to your account and visit our online shop.
  3. Add a [New Seat]. The new seat opens for configuration. Here, you may add any ILNumerics modules needed. Note that the ‘Visual Studio Tools’ module is already selected (price: 0,00 EUR). The Array Visualizer is included here:2020-11-26 20_06_31-ILNumerics Account
  4. Complete your order. If you haven’t selected any other modules besides the Visual Studio Tools item, your price will still be 0,00 EUR and no payment will be necessary (no credit card either).
  5. After finishing your order you will receive your invoice by email. The invoice contains your license key.2020-11-26 20_15_10-ILNumerics Billing
  6. If you haven’t done yet: download ILNumerics. You will find the latest distribution in your account in the [Downloads] tab. Install the package on your machine.
  7. Activate your seat. This can be done right within the Visual Studio options dialog, in the ILNumerics -> Licenses tab.  Further details on activation: https://ilnumerics.net/license-activation.html

That’s it! You are now ready to start visualizing the heck out of your memory! Start here: https://ilnumerics.net/visualstudio-extension.html

Oh, and please don’t forget to send us some screenshots of your Visual Studio debugging sessions with the ILNumerics Array Visualizer. We’ll make a movie out of them and enjoy them during our next team meetup on a huge screen with chips and beer!

Why MINT-students should learn C# instead of Matlab and R

For students who want to prototype and code, there’s a wide range of domain specific programming languages around: R or Matlab – just to name a few – make it quite easy to design new algorithms, analyze data and to plot complex functions. However, bringing innovation from science to enterprise-ready software applications still costs lots of effort, manpower and experience.

Microsoft’s C# has the potential to fully close this gap: in combination with the ILNumerics numerical library, it becomes the perfect environment both for prototyping and creating powerful software. But why is it that the mainstream in science still doesn’t make use of the power of C#, .NET and Visual studio? Well, let’s have a closer look at computing in academia.

Scientific Computing vs. Real-Life Applications

Like in many other areas in academia, the main reason for sticking to out-of-date techniques is “tradition”. That’s why in 2020 there is still a huge gap between mathematical prototyping and industrial-grade production software. Scientists often aim at creating an algorithm solely in order to validate a new model or theory. Once done, they produce a couple of nice looking plots, publish a paper or two and: bye bye.

Unless… there is a greater use to it!

Turning Prototypes into Products

If a model created in science has the potential to earn serious money,  the algorithm created by a mathematician, physicist or microbiologist has to be transformed into something ‘production ready’ that can be used in real life.

This is where many development teams in the biggest enterprises are stuck today: Everytime they want to adapt innovation from science for their own software applications, they have to translate Matlab, numpy or R codes into a modern, managed SW framework. This transformation is where the pain starts: It has to be done! And not only once, but with each update. This causes a whole lot of redundant and boilerplate code – each and every time. This work is expensive: It requires expert knowledge, extensive testing and additional maintenance. And it significantly delays the time to market!

C# + ILNumerics: Perfect Match for High Performance Applications

This pain needs to stop. And this is where modern general purpose languages come into play. C# for example fulfills all the requirements for the design of numerical algorithms! Developer tools such as Visual Studio – especially if complemented with the ILNumerics Array Visualizer und its highly efficient n-dimensional arrays – allow scientists and software developers to design and prototype their algorithms by directly using production-ready tools.

That means: the painful and time consuming transformation from domain specific prototyping code to an industrial software product is no longer necessary, which allows for huge time savings – often more than 50 percent!

Maths, Physics, Engineering: Boost Innovation now!

However, most established scientists in academia today still stick to Matlab, numpy and R – even though it’s not necessary anymore. But fortunately, a new generation of scientists has started to push innovation at universities ahead. They are often familiar with modern programming languages like C#, and they want to use their knowledge to combine academic prototyping with state of the art software development. This new generation will modernize the intersections between science and industry. Their skills will allow companies all over the world to realize all their innovative potential in a much shorter period of time.

That’s why we think any science and engineering student should get familiar with modern programming languages as early as possible: It will help him or her to boost innovation in their field all over the world!

New Major Release: Version 5 is out

Phew… this was a huge effort! But now it’s done: ILNumerics Ultimate VS version 5 is released.

For those of you who have wondered: yes, it became a little quiet around ILNumerics lately. Honestly, we were so busy finishing this release that we simply forgot to throw out some news. Anyways! Now that it’s finally done, let’s turn on the spotlight!

What’s new in ILNumerics Ultimate VS version 5?

  • New, redesigned array classes
  • Nicer class names: ILArray<T> is gone!
  • Breaking the limits: support for big and huge data
  • Support for Matlab® and (new:) numpy array styles
  • Outstanding speed for all array sizes
  • Installs into Visual Studio 2013…2017

Quick links: changelog, conversion guide v4..v5, online docu

New, redesigned Array Classes

ILNumerics Ultimate VS versions 5 focusses on the Computing Engine. The array classes have been redesigned for better execution speed and better compatibility with huge data and numpy arrays. However, the API has been kept mostly the same.

Continue reading New Major Release: Version 5 is out

Release Notes ILNumerics 4.13 (detailed)

See this changelog for a quick list of all changes in version 4.13. Here we dive into some details about specific topics.

Visualization Engine Improvements

The main goal of the GDI renderer is still to provide a fully compatible alternative to the OpenGL renderer. It automatically replaces the OpenGL default renderer if a problem / incompatible hardware etc. was detected at runtime. The focus lays on feature completeness and precision of the rendering result. The quality of the GDI renderer has been further improved to these regards in 4.13. Changes affect the following low level rendering features.

Antialiasing

Lines now recognize the ILLines.Antialiasing property for thick lines (Width > 2). The antialised rendering works in all situations: transparent lines, lines with stipple patterns, inside /outside of plot cubes and for logartithmic / linear axis scales. The quality of the antialiasing implementation is now on par with common OpenGL implementations.

The new default value for ILLines.Antialiasing is now true. However, thin lines would not profit from antialiasing, hence thin lines will continue to ignore the value of the ILLines.Antialiasing property.

Note, that some OpenGL drivers actually refuse to render certain combinations of line properties. We experienced such behavior for stippled, thick line strips having antialiased rendering configured. In such situations you should make sure to have the most recent OpenGL / graphics card drivers installed. Alternatively you may chose either stippled pattern – / or antialiased rendering by explicitly configuring your lines for either one setting. Or use the GDI renderer instead.

Following is a screenshot of some lines examples. The left side shows the OpenGL renderer result. On the right side the GDI version is shown. Use your browser to show the full, unscaled image (right click on the image):

LinesAntialiasingGDIversOpenGL413… and with dotted lines:DottedLinesAntialiasingGDIversOpenGL413

 

Default Color for ILLines & ILLineStrip

Basic, low level line shapes are now created with a default color: black. Higher level objects (as ILLinePlot, ILSplinePlot, Markers, ILSurface etc.) should always provide a color for low-level objects or use vertex based coloring explicitly. If you are assembling your scene with the low level line objects make sure to check that you have done this. This is not a breaking change.

Note that the setting for the shapes Color property overrides the vertex colors buffer (Colors property). In order to use vertex based coloring one must set the Color property to null, enabling the colors information from the Colors buffer.

Auto Color for Spline Plots

ILSplinePlot (derived from ILLinePlot) now adopts the auto-coloring feature from the ILLinePlot class. When no line color was given at the time the plot was created the color gets assigned which comes next in the ILLinePlot.NextColor color enumeration. Use the linecolor constructor argument of ILSplinePlot or the Line.Color property in order to control the color of the spline line explicitly.

Camera Default Depth: 100

In earlier versions the default ILCamera.ZFar value was 1000 which led to depth buffer precision issues in some situations. The new default value of 100 increases the depth buffer precision significantly. However, the new value (just like the old one) does not take into account the actual depth of your scene! If you encounter unwanted clipping in the far clippping area now, set the value for scene.Camera.ZFar explicitly. At best you know the depth of your scene and use this value. Or – more simple – use the old default of 1000:

 scene.Camera.ZFar = 1000; 

Visual Studio 2017 Compatibility

With the new version ILNumerics installs into the great, fresh new Visual Studio 2017 (released March 2017). … well, at least ‘kind of’…  What is true is that with 4.13 you can again use ILNumerics in all recent Visual Studio editions supporting extensions at all (which is all editions except Express Edition – I wonder if anyone is using this, still??), including Visual Studio 2017 Community Edition. This is great and all … but:

As you noticed there was a great deal of changes coming with VS2017. This includes the installer system which is – great attempt! – much more slim now by omitting unneeded stuff from the installation. But VS2017 also requires changes to the manifest files facilitating every Visual Studio Extensibility project. A new version has been introduced: version3. This is the first version which is not compatible with Visual Studio 2010 anymore. As a consequence, by supporting the new extension packaging system we would be required to drop support for Visual Studio 2010. This is not too dramatic since the 6 years VS2010 is out now feel kind of an eternity in our dev-world. However, we wanted to give the remaining VS2010 customers at least one version iteration of deprecating VS2010 in order to jump to a more recent version smoothly.

Additionally, the way the new VS installer works seems to reflect not the last word spoken on that topic (at least this is what we hope). The bottom line: we use an MSI installer, wrapped in an exe bootstrapper. The MSI installs all GAC binaries, registers the development dlls in Visual Studio, maintains the singleton installation directory and triggers the VSIX installer which installs the extension into all supported Visual Studio installations located on the system. This may sound complicated but worked quite reliably over the years.

Now, with the new Visual Studio package system things become odd. Out of subtle reasons, the MSI cannot (reliably) trigger the VSIX package automatically (and quietly). The reason becomes obvious when considering that the new extension potentially needs to trigger the installation of new components which it depends on. This install would not be possible while the hosting MSI is being installed itself still. So, as a consequence, currently, MSI installs of VSIX extensions into VS2017 are simply not working. When you start the ILNumerics installation from the delivered *.exe you will find the extension being installed into VS2010 and upwards – with the exception of Visual Studio 2017 :(

Currently, some smart (WIX) people are attempting a solution to this. But we are not aware of a clean solution released already. Therefore, we will wait for it and/or eventually consider a new deployment scheme for our extensions.

However, luckily there is a simple workaround for now! After the ILNumerics installation was finished, you can easily install our extension into VS2017 manually. Just go to the installation folder (by default: C:\Program Files (x86)\ILNumerics\ILNumerics Ultimate VS\bin) and find the ‘ILNumerics.VSExtension.vsix’ file. Double click on it to start the installation into the remaining Visual Studio instances manually. This should work without problems. Be sure to accept the warning dialog during the install. It originates from the fact that our extension must still support older Visual Studio extension techniques.

Keep in mind that this is a workaround, though! Once installed the extension will work as expected. But some things will not work consistentyl. Deinstallation, for example must be done manually from the “Extensions and Tools” dialog within Visual Studio 2017:

A manually installed extension requires a manual uninstall.
A manually installed extension requires a manual uninstall.

Also, in difference to the machine wide installation done by the (administrative) MSI install the manual VSIX installation changes the local user account only. You may have to repeat the VSIX install for other user accounts. Besides these lowered installation experience we know of no other incompatibilities in Visual Studio 2017.

Fun with Visual Studio regexp search

I only recently realized that the Visual Studio regexp feature in Search & Replace is even able to handle regexp captures. Example: In order to locate and replace all line endings in your code which exist at the end of non-empty lines, exluding lines which end at ‘/’ or with other non-alphanumeric characters one can use:

Search Pattern: ([0-9a-zA-Z)])\r\n
Replace: $1;\r\n

searchandreplacevisualstudioMatches inside () parenthesis are captured. When it comes to replacing the match the new value is created by first inserting the captured value at the position marked by ‘$1′. This way it it possible to, let’s say insert a new char in the middle of a search pattern – if it fulfills certain conditions.

There appears to be an error in the MSDN documentation, unfortunately at the point describing how to reference captures in Visual Studio regexp replace patterns. Anyway, using the ‘$’ char in conjunction with the index is the common way and it works here just fine.

Multiple captures are possible as well. Just refer to the captured content by the index according to the count of ‘()’ captures in the search pattern.

The Visual Studio regexp support is really helpful when translating huge C header files to C# – to name only one situations. It saved me a huge amount of time already. Hope you find this useful too.

Array Visualizer Extension Integration Tests. A Practical Guide.

Overview

Testing Visual Studio extensions against different versions of Visual Studio poses many challenges due to the large array of Visual Studio versions. The following post describes a generally applicable solution on the basis of the example of the ILNumerics Array Visualizer extension. Furthermore, this article will elaborate on how we managed to perform (relatively) stable test integrations into our build scripts.

When implementing new features into your code, today it is a well accepted fact that your code needs to be tested against as much potential scenarios as possible. You need to write unit tests and integration tests to anticipate the behavior of your application and to ensure it is working in the expected way in all possible situations.

So how do you test a Visual Studio extension? First of all, you should define integration test scenarios, to check if your code works as expected. Make sure to include many different scenarios to achieve maximum code coverage.

What do we have to test?

The ILArrayVisualizer is a Visual Studio extension that allows you to visualize large data sets in a number of ways during your debug session. Our current version 4.11 can not only be installed in different versions of Visual Studio but also supports various project languages and data types. To be precise, the full parameter space has the following dimensions:

  • Visual Studio versions: 2010, 2012, 2013, 2015
  • Programming languages: C#, Visual Basic, F#, Fortran, C/C++
  • All common project types: dll, exe
  • Platforms: 32/ 64 bit
  • All supported array types for each individual language: 1D/ n-dim, ILArray<T>, pointers, std::array, a.s.f.
  • All numeric element types: double, float, (f)complex, (u)int32/16/64), bytes, char, …
  • Arrays may also contain special numbers (NaNs, Inf), empty shapes in various forms, uninitialized arrays, or NULL.

The huge parameter space forbids any test strategies that are based on manually performed testing. In order to cover the most important cases only, we’ve identified more than 1000 integration test cases! As a result, automated integration tests are obligatory. Furthermore, the tests need to be integrated into our build system and have to support manual (debug) as well as automated (release) triggers.

One may ask: why not simply testing the array visualizer service in a regular unit test project? Why do we need to perform integration tests inside Visual Studio at all? The answer is that Visual Studio provides a complex environment which needs to fullfill a huge amount of requirements and to support us in a vast number of situations. A complex interaction with the Visual Studio debugger uncovers incompatibilities here and there. Performing automated integration tests over nearly the whole parameter space provides good coverage of all expected usage scenarios and indeed allowed us to identify any incompatibilities – and to work around them.

First: Define a Testing Strategy, Second: Write the Code

To test the ILArrayVisualizer, we had to define a testing strategy first. The testing strategy includes procedures that are performed during each test run and therefore have to be automated in a reliable way:

  • Run the experimental instance of Visual Studio including the installed ILArrayVisualizer extension.
  • Load a predefine test project of a certain programming language as supported by the Array Visualizer.
  • Stop the debugger at a certain, predefined position in the project.
  • Inspect various array instances and check that the Array Visualizer service is able to deliver correct results for it.

Hereby, the main challenge was that integration tests for Visual Studio extensions must run inside another instance of Visual Studio and that we needed a method to ensure this reliably for all supported version of Visual Studio. The test target (our extension project) must be installed in the target Visual Studio version in advance. Consequently, the tests were run in a semi-automatic mode: Our testing framework had to be executed once for each Visual Studio version.

How to Run the Test Code in an Experimental Instance of Visual Studio

Do you know how to write standard unit tests in C#? It is quite simple: While the attribute [TestClass] is applied to each test class object, the attribute [TestMethod] is applied to each test method. You can find more information about all attributes in the official MSDN Article “Anatomy of Unit Tests”.

For the integration tests things become a little more demanding. Let’s take a look at our testing strategy one more time: For each test method we have to create a new experimental instance of Visual Studio. Since we have to run more than 1000 test cases for each Visual Studio version, creating new instances, opening the test project and other actions will take too much time. So let’s change our strategy: We will start the experimental instance only once for all test methods, leveraging the attributes mentioned above.

To begin with, we implemented a [ClassInitialize] method, which runs the experimental instance for our test methods once:

[ClassInitialize]
[HostType("VS IDE")]
[TestProperty("VsHiveName", "14.0Exp")]

public static void TestClassInitialize(TestContext testContext)
{
// test class initialize
}
  • The attribute [ClassInitialize] tells the compiler that the following public method is defined as a constructor for our test class.
  • The attribute [HostType(“VS IDE”)] defines that the following method should be executed in Visual Studio IDE. It’s also possible to create an instance of another host type. You can read more about it in MSDN.
  • The attribute [TestProperty(“VsHiveName”,”14.0Exp”)] defines that the following test method will be executed in Visual Studio Version 14.0(VS 2015) inside the experimental instance.

Why do we actually need an Experimental instance? Because it gives us a way to run, debug and test our extension code – from inside Visual Studio.  The Experimental instance runs parallel to the
‘main’ instance. It allows a clear separation in terms of configuration and installed extensions. During the development of our extension, we can build and run our project by installing the target VSIX extension only inside the experimental instance.

Experimental Instance is launched, what’s next?

As our test method runs in an experimental instance of Visual Studio, we need to obtain an environment DTE. The DTE object is the root of the automation model, which other object models often call “Application”.

ILArrayVisualizer Extension works in the COM model of Visual Studio and as such has its own GUID which is used to identify the service among the list of available serives in Visual Studio. To call the methods to be tested, we also need to obtain a service provider. By using the service provider, we can access the ILArrayVisualizer Service inside the experimental instance and call the methods to be tested.

var m_envdte = VsIdeTestHostContext.Dte;
var m_shellservice = VsIdeTestHostContext.ServiceProvider.GetService(typeof(SVsShell)) as IVsShell;

Guid packageGuid = new Guid("XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX");

m_shellservice.LoadPackage(ref packageGuid, out m_package);

m_serviceProvider = new ServiceProvider(m_envdte as Microsoft.VisualStudio.OLE.Interop. IServiceProvider);

What about the Debugger and the test methods?

To obtain a debugger session descriptor in another instance of Visual Studio, the IDE uses the following code:

var m_debugger = m_envdte.Debugger;

So, this refers to the Debugger object inside the (experimental) instance referenced by the Environment DTE object. This way it is actually possible to control other instances of the VS IDE.

Basically, each integration test simulates the actions a regular user of the Array Visualizer would take: inside a debug session she inspects certain arrays (large variation here!) in the Array Visualizer and receives a number of correct outputs for them. All test projects (i.e. the debug target handled by our virtual users) have to be predefined. Each language comes with its own test projects files, in which we defined a large set of supported array expressions, including a reasonable number of edge cases. The projects are loaded via remote control from our integration tests, are started and halted in debug mode inside the remotely controlled Visual Studio instance. Visual Studio is used to set breakpoints in them, to stop the debugger and perform queries to the Array Visualizer service – one query for each of the predefined test array objects (more than 1000 test arrays exist).

In our case, we just open one of the predefined C++, C#, Visual Basic or FORTRAN project, run it in a new debug session, and iterate all contained array expressions by calling the evaluate() method of our ILArrayVisualizer service on it – all from our testing framework.

To open the existing projects and add those, we use the following method:

EnvDTE.Solution.AddFromFile(FileName);

To build the Solution:

EnvDTE.Solution.SolutionBuild.Build(true); 
// True means, wait until build is ready.

As we get debugger session descriptor, setting the breakpoint is as easy as:

Debugger.Breakpoints.Add(FileName, LineNumber);

All necessary initialization code could be packed into the test class initializer. In this case only one experimental instance will be created for each language and platform and reused to test  many test methods at once.

Passing the Test Code to Visual Studio IDE Experimental Instance

By default, all integration tests are hosted and run in the VSTestHost.exe process. To pass the test code to be executed to the experimental instance (in another environment), we have to use the Test Host Adapter. UIThreadInvoker allows us to implement a delegate method and pass it to our test environment.

UIThreadInvoker.Initialize();
UIThreadInvoker.Invoke((ThreadInvoker)delegate ()
{
// some test cases
}

The integration tests are not only for finding bugs inside your code, but also help you write correct code. To write unit tests, we specified how a class or method should work first. Read more about it here: “Test-Driven Development”. We used this strategy to write a correct expression evaluator for C++ code for the ILArrayVisualizer. We wrote all possible test methods for each possible expression and tested the expected values – for all supported languages.

When implementing a new feature it is crucial to develop a testing strategy to ensure proper functionality and achieve maximum code coverage. In Visual Studio there are testing tools that you can use for this purpose.

The Productivity Machine | A fresh attempt for scientific computing | http://ilnumerics.net