ILNumerics: Autonomous Computing

ILNumerics presents novel methods for the autonomous parallelization of sequential array codes, as known from prototyping languages, as: ILNumerics, Matlab® and numpy. Here, central decisions that programmers previously had to make are taken over by the compiler – when all relevant data is available: at runtime. Automatically and without user intervention.

ILNumerics identifies parallel potential of a program by means of consecutive array instructions within the instruction stream and thus independent of the program structure and is therefore not limited to loops, functions or branches.

At runtime, the compiler generates optimized low-level code for the optimal, existing hardware from a variable number of instructions. Depending on the granularity, SIMD and OpenCL resources are used here.

Execution Nets

ILNumerics breaks up the usual sequential execution of a program. Instead of processing the array instructions sequentially, each instruction is first integrated into a highly volatile data structure (execution net) at runtime. The execution net contains a (often large) number of upcoming instructions and links them according to their actual dependencies.

Nodes of the Execution Net decide which existing hardware they use for their execution and in what way. For this purpose, all relevant information is taken into account: the data as well as the hardware and its states are known at this time.

While the main thread adds new instructions to the execution net, existing nodes are processed. For this purpose, all hardware resources available at runtime are used simultaneously. The parallel potential of the algorithm, the data and the hardware is fully exploited.

Array pipelining to parallelize sequential programs

Parallel execution units have played an important role in processor construction for 60 years. With the help of "pipelines", parts of sequential operations can be carried out in parallel – and thus faster. ILNumerics takes up this basic idea for complex array algorithms.

Independent operations are distributed across different pipelines and run simultaneously. The cores of a processor or any OpenCL-capable hardware resource serve as a "pipe". Dependent operations start executing as soon as their inputs are partially completed. This means that even those programs benefit that could not previously be parallelized at all!

Parallel on many Levels

At every level of granularity, the hardware must be efficiently supplied with data of the appropriate size. At the level of individual array instructions, ILNumerics applies all the optimizations that conventional (C++) compilers use. With the help of so-called "micro-JIT" compilers, each node in the execution net creates the operations it contains as highly optimized cores for the selected hardware. In addition, newer methods are used, such as dynamic workload control and latency minimization.

Benefits of ILNumerics Technology

ILNumerics identifies parallel potential at runtime. Without manual control and with fine granularity. The results use the hardware more efficiently than manual methods. The semantics of the program are completely preserved. The methods scale automatically and efficiently with the parallel capacity of the hardware for both small and large data. Time-consuming manual optimizations of numerical array codes are now a thing of the past.

The elimination of complex program analyses enables the high robustness and efficiency of the procedure.

Mehr Details: https://ilnumerics.net/faster-array-codes.html