ETH Large-Scale AI Engineering 2026: Mamba, DeltaNet and Torch DDP

Type: Course Project | Year: 2026 | Topics: State Space Models, Distributed Training, Performance Engineering

This project extended an assignment from the ETH Large-Scale AI Engineering course with custom implementations of DeltaNet and Mamba state space models, alongside a Transformer baseline for comparison.

Beyond implementing the architectures themselves, we studied their throughput and training behavior under realistic systems constraints. To make better use of available compute, we added distributed data parallel training and gradient checkpointing, and documented how to reproduce the experiments on Clariden.

The result was a compact systems-and-modeling project that combined architecture implementation, performance benchmarking, and practical distributed training engineering.

Code Report

Martin Wertich