Eingebettete Systeme und ihre Anwendungen

Als eine Gruppe der Technischen Informatik arbeiten wir an der Schnittstelle von Hardware und Software. Der Schwerpunkt unser aktuellen Forschung liegt dabei auf der effizienten Bereitstellung von Rechenleistung. Effizient heisst hier, dass die in diversen Anwendungsgebieten erforderliche Rechenleistung nicht oder nur mit hohem Energieverbrauch von Standardprozessoren bereitgestellt werden kann.

Als Alternative schlagen wir adaptive Computer vor, die einen kleineren, energieeffizienten Standardprozessor mit einer hoch optimierten rekonfigurierbaren Recheneinheit kombinieren. Letztere kann in ihrer Struktur optimal an die Anforderungen der aktuellen Anwendung angepasst werden und so die Spitzenlast der Rechenleistung bei niedrigerem Energieverbrauch bereitstellen.

Um dieses Ziel zu erreichen, realisieren wir Hardware-Erprobungsplattformen für solche Rechnerarchitekturen (einschließlich der erforderlichen Betriebssystemanpassungen) und erproben diese dann anhand von praktischen Anwendungen. Nach den sehr vielversprechenden Ergebnissen dieser Untersuchungen haben wir unser Augenmerk nun darauf gerichtet, die Programmierbarkeit adaptiver Computer so zu verbessern, dass sie auch von Entwicklern ohne Kenntnisse des Hardware-Entwurfs genutzt werden können. Dazu entsteht ein kompletter Compiler-Fluss, der eine Hochsprache automatisch auf die beiden Recheneinheiten aufteilt. Der an die rekonfigurierbare Recheneinheit zugewiesene Teil wird dann mit Methoden der Hardware-Synthese und des Chip-Entwurfs (Mapping, Platzierung, Verdrahtung) automatisch in eine dort ausführbare Struktur transformiert.

Da wir für dieses Unterfangen natürlich auf die Mitarbeit durch interessierte Studierende mit entsprechenden Vorkenntnissen angewiesen sind, werden für diese Themen auch einführende Lehrveranstaltungen entwickelt.

News

ESA has three papers accepted at SC25-colocated workshops

We’re pleased to share that our research group will be strongly represented at Supercomputing 2025 (SC25) in St. Louis, USA, with three papers accepted across two co-located workshops. These contributions highlight our ongoing efforts in heterogeneous acceleration, compute-graph frameworks, and network-to-storage acceleration.

Our first accepted paper, Architecting Tensor Core-Based Reductions for Irregular Molecular Docking Kernels (DOI: 10.1145/3731599.3767437), by Leonardo Solis-Vasquez, Andreas F. Tillack, Diogo Santos-Martins, Andreas Koch, and Stefano Forli, explores how NVIDIA Tensor Cores can be adapted to accelerate irregular reduction patterns found in molecular docking, while maintaining accuracy through targeted error-correction strategies.

The second paper, A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI Engines (DOI: 10.1145/3731599.3767411), authored by Jonathan Strobl, Leonardo Solis-Vasquez, Yannick Lavan, and Andreas Koch, introduces a framework that enables embedding compute graphs directly into C++ applications and deploying them onto AMD Versal AI Engines (AIEs). This work connects high-level graph representations with spatial computing on modern heterogeneous devices.

Our third accepted submission, SNAcc: An Open-Source Framework for Streaming-based Network-to-Storage Accelerators (DOI: 10.1145/3731599.3767412), by David Volz, Torben Kalkhof, and Andreas Koch, presents an open-source FPGA-based framework for designing streaming network-to-storage acceleration pipelines—an increasingly important topic as data-movement costs dominate modern HPC systems.

Together, these three papers underscore the continued technical depth and diversity of our group’s research activities. We extend our thanks to all co-authors, collaborators, and supporting institutions for making this possible. We look forward to presenting at SC25, engaging with the community, and sharing insights on these topics.

By Dr.-Ing. Leonardo Solis-Vasquez, 17.11.2025
ESA paper presented at IPDPS 2025

We are proud to announce that our paper “Accelerating Sparse Linear Solvers on Intelligence Processing Units” by Tim Noack, Louis Krüger, and Andreas Koch has been presented at IPDPS 2025 in Milano, Italy.

In this work, we present Graphene, an open-source framework for efficiently solving large sparse linear systems on Intelligence Processing Units (IPUs). Graphene uses a custom Domain-Specific Language (DSL) that enables expressing complex algebraic algorithms close to their mathematical notation. To the best of our knowledge, we are the first to combine the Mixed-Precision Iterative Refinement method (MPIR) with Double-Word Arithmetics to achieve high precision solutions on an architecture lacking native double-precision support. Graphene is available on GitHub.

By Tim Noack, 10.06.2025
AMD HACC Tech Talk and TaPaSCo Example Repository

In October, we had been invited to present our TaPaSCo framework in the AMD HACC Tech Talk series. The talk, given by Torben Kalkhof, is now freely available on YouTube! In this talk, we explore how TaPaSCo leverages key features of the AMD Versal architecture, including AI Engines (AIE) for high-performance compute, Queue Direct Memory Access (QDMA) for efficient data streaming, and Multi-rate MAC (MRMAC) for high-speed networking.

Also, check out our new TaPaSCo Examples Github repository. Here, the source code of both presented examples is available. We plan to continuously add further examples to demonstrate the usage of different TaPaSCo features in the future.

By Torben Kalkhof, 4.11.2024
ESA papers at workshops of SC 2024

We can proudly announce that two ESA papers have been accepted for presentation at the workshops of SC 2024.

The first paper, titled Speeding-Up LULESH on HPX: Useful Tricks and Lessons Learned using a Many-Task-Based Approach by Torben Kalkhof and Andreas Koch will be presented at PAW-ATM 2024. In this work, we port the OpenMP-based reference implementation of the LULESH proxy application to the HPX programming framework, achieving speedups from 1.33x to 2.25x. Furthermore, we present our optimization techniques used for switching from a fork-join to a many-task-based programming paradigm.

The second paper, titled DeLiBA-K: Speeding-up Hardware-Accelerated Distributed Storage Access by Tighter Linux Kernel Integration and Use of a Modern API by Babar Khan and Andreas Koch, is based on our FPGA-based storage framework, DeLiBA-K, at the Linux kernel level (io_uring being a significant part). This work is a result of a joint collaboration with our industrial partner where we further developed and tested our improved Ceph storage FPGA accelerator in DeLiBA-K also on industrial benchmarks. This paper will be presented at the H2RC workshop, and the final open-source version of DeLiBA-K will be released ahead of the SC 2024 in Nov 2024.

Congratulations to everyone involved!

By Torben Kalkhof, 14.10.2024
ESA paper wins Best Paper Award at RAW 2024 and new TaPaSCo release

We are proud to announce that our paper entitled TaPaSCo-AIE: An Open-Source Framework for Streaming-based Heterogeneous Acceleration using AMD AI Engines - by Carsten Heinz, Torben Kalkhof, Yannick Lavan, and Andreas Koch - has won the Best Paper Award at RAW 2024, co-hosted with IPDPS in San Francisco, CA.

In this work, we propose a framework for streaming-based computation in heterogeneous systems. TaPaSCo-AIE focuses on AMD Versal devices and incorporates AI Engines, DMA streaming and 100G network. In our real-world evaluation based on a neural network, we achieve significant speed up over memory-mapped solutions, and exceed the performance of CPUs and even an A100 GPU.

All proposed extensions are included in our newest TaPaSCo 2024.1 release along with further improvements of our framework. Check out our Github repository and release notes for more details.

Congratulations and keep up the good work everyone!

By Torben Kalkhof, 29.05.2024
Two ESA papers get accepted at ARC 2024

We are very happy to announce that two submitted papers by our group have been accepted at ARC 2024 and will be presented in Aveiro, Portugal.

The first paper entitled Graphtoy: Fast Software Simulation of Applications for AMD’s AI Engines – by Jonathan Strobl, Leonardo Solis-Vasquez, Yannick Lavan, and Andreas Koch – proposes a graph simulator which can be embedded into an existing application to prototype acceleration compute kernels for data flow accelerator architectures, such as the AMD AI Engines. By leveraging cooperative multi-tasking, Graphtoy outperforms the AMD AI Engine x86 simulator while providing better debugging possibilities.

The second paper entitled Enabling FPGA and AI Engine Tasks in the HPX Programming Framework for Heterogeneous High-Performance Computing – by Torben Kalkhof, Carsten Heinz, and Andreas Koch – proposes the transparent usage of TaPaSCo FPGA and AI Engine tasks in HPX by adopting the lightweight threading model of HPX for TaPaSCo tasks. As proof-of-concept speedups are shown in a 1D-stencil benchmark and a port of the LULESH proxy application by leveraging cooperative computing on CPU and FPGA or AI engines, respectively.

Congratulations and keep up the good work everyone!

By Torben Kalkhof, 12.03.2024

You can find more news in our archive.

News

ESA has three papers accepted at SC25-colocated workshops

ESA paper presented at IPDPS 2025

AMD HACC Tech Talk and TaPaSCo Example Repository

ESA papers at workshops of SC 2024

ESA paper wins Best Paper Award at RAW 2024 and new TaPaSCo release

Two ESA papers get accepted at ARC 2024