High-performance array code for .NET
Numerical software often forces a trade-off: high-level array code is easier to write and maintain, while low-level implementations are usually written for speed. ILNumerics narrows this gap by combining readable NumPy- and MATLAB-style syntax with automatic runtime optimization and autonomous parallel execution.
Developers can express numerical algorithms as compact array expressions in C# and .NET. At runtime, ILNumerics adapts execution to the actual algorithm, data shape, memory layout, and available hardware.
Readable array code. Runtime-optimized execution.
The magic of autonomous parallelization
Consider a simple numerical task: add two large matrices and calculate the column sums. In a low-level language such as C, this requires explicit loops, manual indexing, and careful attention to memory access patterns.
In an array language, the same operation can be written directly as:
sum(A + B, 0)
This expression is shorter, easier to inspect, and closer to the mathematical intent of the algorithm. With ILNumerics, it can also become a high-performance execution plan at runtime.
Benchmark example
The original benchmark on this page compares a legacy C implementation with an ILNumerics array expression for repeated sums over a 2000 x 2000 double matrix.
For column sums over 1,000 repetitions, the benchmark reports:
- C version: 2,460 ms
- ILNumerics: 551 ms
For row sums over 1,000 repetitions, the benchmark reports:
- C version: 27,159 ms
- ILNumerics: 566 ms
These numbers should be read as a workload-specific example, not as a universal claim that every ILNumerics expression is faster than every hand-written low-level implementation. Performance depends on algorithm, data shape, memory layout, hardware, runtime configuration, and the quality of the comparison implementation.
The important point is different: high-level ILNumerics array code gives the runtime enough semantic information to optimize whole array operations automatically, instead of forcing the developer to manually encode every performance decision in low-level loops.
Why the high-level version can be fast
Traditional compilers can generate very efficient machine code for a specific processor and a specific low-level implementation. But they usually cannot infer the full numerical intent of a high-level array expression once that intent has been flattened into manual loops.
ILNumerics works at the array-expression level. The runtime can reason about operations such as addition, reduction, memory reuse, temporary elimination, vectorization, cache-aware execution, and dependency-safe parallel execution before the work is committed to hardware.
This allows ILNumerics to optimize not only individual operations, but also the way operations are scheduled and combined at runtime.
Use the hardware you already bought
Modern machines provide multiple CPU cores, vector units, memory hierarchies, and sometimes additional accelerator devices. Using these resources efficiently by hand often requires threading, blocking, vectorization, native libraries, device selection, benchmarking, and hardware-specific code paths.
ILNumerics moves many of these decisions into the execution layer. The Accelerator adapts to the algorithm, data, and hardware found at runtime. It can reduce unnecessary temporary arrays, reuse memory where safe, apply optimized numerical kernels, and move array workloads into autonomous, dependency-safe parallel execution.
The result is not “magic speed” in every case. It is a more productive performance model: developers keep readable numerical code, while the runtime handles many optimization decisions that would otherwise require manual performance engineering.
The unique advantage
ILNumerics is not just a wrapper around native numerical libraries. Its key advantage is that array instructions remain visible to the runtime as high-level numerical operations. This gives ILNumerics the opportunity to remove unnecessary sequencing, overlap independent or partially dependent work, and adapt execution decisions to the actual runtime situation.
In other words: the program stays readable and sequential at the source level, while execution can become concurrent, pipelined, and parallel where correctness allows it.
ILNumerics preserves the local dependencies required for correctness, while removing unnecessary sequencing from array execution. The result: Array Instruction Level Parallelism.
Further Readings:
- ILNumerics Autonomous Computing Technology
- Accelerator Compiler technology page
- Speed comparison: ILNumerics, Fortran, NumPy
- Getting started guide I: low level expression optimizations explained
- Exceeding the speed of manual OpenMP / Intel's® MKL: a faster FFT
- Comparing array pipelining over manual parallel loops
