Embedded Systems and Applications Group
While part of the Computer Science department, much of our work occurs at the level of the hardware-software interface. Our current research focus is on how to efficiently provide computing performance in situations where the capabilities of a standard microprocessor do not suffice, or its energy requirements would be excessive.
As an alternative, we propose adaptive computers: combining a smaller, low-power microprocessor with a highly optimized reconfigurable compute unit. The structure of the latter can then be optimally adapted to the precise needs of the current application, and thus provide the required compute power with reduced energy consumption.
To achieve this goal, we realize hardware demonstrators for such computer architectures (including the required operating system ports), and evaluate these using practical applications from a variety of fields. After very promising results, we have now concentrated on making the potential of such computers available even to developers who lack the skills in hardware design that are still required to program such systems. To this end, we have been working on a complete compile flow for partitioning a program in a software high-level programming language for separate execution on the two compute units. The part assigned to the reconfigurable compute unit is then processed further using techniques from hardware synthesis and physical chip design (mapping, placement, routing).
Many of our research efforts rely on support by motivated students with appropriate skills and experience. Thus, our group also develops lectures and labs on the wide range of topics listed above.
News
-
AMD HACC Tech Talk and TaPaSCo Example Repository
In October, we had been invited to present our TaPaSCo framework in the AMD HACC Tech Talk series. The talk, given by Torben Kalkhof, is now freely available on YouTube! In this talk, we explore how TaPaSCo leverages key features of the AMD Versal architecture, including AI Engines (AIE) for high-performance compute, Queue Direct Memory Access (QDMA) for efficient data streaming, and Multi-rate MAC (MRMAC) for high-speed networking.
Also, check out our new TaPaSCo Examples Github repository. Here, the source code of both presented examples is available. We plan to continuously add further examples to demonstrate the usage of different TaPaSCo features in the future.
By Torben Kalkhof, 4.11.2024
-
ESA papers at workshops of SC 2024
We can proudly announce that two ESA papers have been accepted for presentation at the workshops of SC 2024.
The first paper, titled Speeding-Up LULESH on HPX: Useful Tricks and Lessons Learned using a Many-Task-Based Approach by Torben Kalkhof and Andreas Koch will be presented at PAW-ATM 2024. In this work, we port the OpenMP-based reference implementation of the LULESH proxy application to the HPX programming framework, achieving speedups from 1.33x to 2.25x. Furthermore, we present our optimization techniques used for switching from a fork-join to a many-task-based programming paradigm.
The second paper, titled DeLiBA-K: Speeding-up Hardware-Accelerated Distributed Storage Access by Tighter Linux Kernel Integration and Use of a Modern API by Babar Khan and Andreas Koch, is based on our FPGA-based storage framework, DeLiBA-K, at the Linux kernel level (io_uring being a significant part). This work is a result of a joint collaboration with our industrial partner where we further developed and tested our improved Ceph storage FPGA accelerator in DeLiBA-K also on industrial benchmarks. This paper will be presented at the H2RC workshop, and the final open-source version of DeLiBA-K will be released ahead of the SC 2024 in Nov 2024.
Congratulations to everyone involved!
By Torben Kalkhof, 14.10.2024
-
ESA paper wins Best Paper Award at RAW 2024 and new TaPaSCo release
We are proud to announce that our paper entitled TaPaSCo-AIE: An Open-Source Framework for Streaming-based Heterogeneous Acceleration using AMD AI Engines - by Carsten Heinz, Torben Kalkhof, Yannick Lavan, and Andreas Koch - has won the Best Paper Award at RAW 2024, co-hosted with IPDPS in San Francisco, CA.
In this work, we propose a framework for streaming-based computation in heterogeneous systems. TaPaSCo-AIE focuses on AMD Versal devices and incorporates AI Engines, DMA streaming and 100G network. In our real-world evaluation based on a neural network, we achieve significant speed up over memory-mapped solutions, and exceed the performance of CPUs and even an A100 GPU.
All proposed extensions are included in our newest TaPaSCo 2024.1 release along with further improvements of our framework. Check out our Github repository and release notes for more details.
Congratulations and keep up the good work everyone!
By Torben Kalkhof, 29.05.2024
-
Two ESA papers get accepted at ARC 2024
We are very happy to announce that two submitted papers by our group have been accepted at ARC 2024 and will be presented in Aveiro, Portugal.
The first paper entitled Graphtoy: Fast Software Simulation of Applications for AMD’s AI Engines – by Jonathan Strobl, Leonardo Solis-Vasquez, Yannick Lavan, and Andreas Koch – proposes a graph simulator which can be embedded into an existing application to prototype acceleration compute kernels for data flow accelerator architectures, such as the AMD AI Engines. By leveraging cooperative multi-tasking, Graphtoy outperforms the AMD AI Engine x86 simulator while providing better debugging possibilities.
The second paper entitled Enabling FPGA and AI Engine Tasks in the HPX Programming Framework for Heterogeneous High-Performance Computing – by Torben Kalkhof, Carsten Heinz, and Andreas Koch – proposes the transparent usage of TaPaSCo FPGA and AI Engine tasks in HPX by adopting the lightweight threading model of HPX for TaPaSCo tasks. As proof-of-concept speedups are shown in a 1D-stencil benchmark and a port of the LULESH proxy application by leveraging cooperative computing on CPU and FPGA or AI engines, respectively.
Congratulations and keep up the good work everyone!
By Torben Kalkhof, 12.03.2024
-
ESA’s work on oneAPI and AutoDock-GPU is featured on Intel Community Blog
A recent post at https://community.intel.com features an article about our latest work levering oneAPI for achieving a SYCL-enabled version of the AutoDock-GPU molecular docking application.
The post describes our collaboration with Intel for migrating AutoDock-GPU from CUDA to SYCL, and thus, freeing this code-base from lock to a specific GPU vendor. Moreover, it highlights that our work 1) provides a detailed process reference for CUDA-to-SYCL migration and 2) evaluates the SYCL code-base of AutoDock-GPU on Intel Data Center Max 1550 GPU (code-named Ponte Vecchio), 4th Gen Intel Xeon Scalable Processor, as well as NVIDIA GPU.
Enjoy reading the full post AutoDock-GPU: SYCL Enabled Molecular Screening for Science and Medicine!
By Dr.-Ing. Leonardo Solis-Vasquez, 21.08.2023
-
ESA Team achieved 10th place at Meet And Move Ultra Marathon
After winning the first place in the TU Darmstadt Meet And Move Ultra Marathon lottery back in 2022, the ESA team proudly reached the 10th place in this year’s competitive run. Our team was supported by multiple ESA-external runners. Torben Kalkhof was our fastest runner with 18.33 minutes.
By Christoph Spang, 23.05.2023