ETH Large-Scale AI Engineering 2026: Mamba, DeltaNet and Torch DDP

The goal of this project was to expand the second assignment of the Large-Scale AI course at ETH Zurich with implementations of the DeltaNet [4, 6] and Mamba [3, 2]. Besides imple- mentation, we also compared these two architectures with the standard Transformers [5], which served as our baseline. Additionally, we expanded the Large-Scale AI Engineering assignment with distributed data parallel (DDP) and gradient checkpointing options to fully utilize the available compute. Our code and instructions to reproduce the results on Clariden are available on GitHub: https://github.com/Timisorean/large-scale-ai-project.

The GitHub repository is at Large-Scale AI Engineering 2025 Project Repository and the corresponding report at Computational Intelligence Lab 2025 Project Report