See all jobs

AI Trace Generation Engineer

Permanent employee,
Full Time
·
Heidelberg
Your mission
  • Design and implement a trace collection system for distributed LLM workloads, capturing compute operations, communication primitives, memory usage, and cluster topology across multi-GPU and multi-node setups
  • Validate that collected traces accurately reflect real workload behavior - verifying operation completeness, timing consistency, and data integrity across inference and training pipelines
  • Integrate with and instrument major LLM frameworks (vLLM, TensorRT-LLM, DeepSpeed, Megatron-LM and others) to extract meaningful execution data without disrupting performance
  • Use collected traces as input to discrete event simulations that model and replay distributed AI workload behavior at scale
  • Analyze trace data to surface bottlenecks and inefficiencies across the stack, from individual kernel execution to cluster-wide communication patterns
Your profile
  • 3+ years of experience in AI systems, ML infrastructure, or a closely related area
  • Hands-on experience with at least one major LLM serving or training framework  
  • Strong proficiency in Python and C++, with a solid understanding of GPU architecture, memory bandwidth, and the difference between compute-bound and memory-bound operations
  • Solid understanding of distributed communication
  • Familiarity with parallelism strategies and how they shape execution behavior across large clusters
  • Open source contributions or published research in relevant areas will definitely be appreciated!
  • Previous startup experience is a plus - we move fast and value people who are comfortable with that
Why us?
  • Build something big: Help build and scale a fast-growing AI infrastructure startup
  • Pay & perks: Competitive compensation with a performance-based incentive, subsidized Deutschlandticket, and access to a discount portal
  • Work your way: Flexible hours with hybrid and remote-friendly options
  • Fast lanes, no red tape: Flat hierarchies and rapid decision-making mean ideas ship quickly
  • Global team: Work with a diverse, international team across Germany and the USA
  • Modern headquarters: Well-stocked office near the Heidelberg Hauptbahnhof, available on a hybrid basis or as a place to connect during our quarterly team workshops
  • Top setup: Your choice of high-quality hardware and equipment
  • Relocation support: We’ll help make your move to join us as smooth as possible
About us
turbalance is an innovative, emerging startup that transforms AI laws. We are a team of passionate problem-solvers who believe in what we’re building. We constantly push boundaries and embrace our inner nerds as we find new ways to tackle complex challenges. You will find a dynamic work environment here, with flat or even non-existent hierarchies and the chance to take on responsibility from day one.
Apply for this job