Sr. Staff AI Training Performance Engineer

Advanced Micro Devices, Inc.
$204,000.00/Yr.-$306,000.00/Yr.
United States, California, San Jose
2100 Logic Drive (Show on map)
Apr 08, 2026
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: Our goal is to make training faster, more efficient, and more scalable on AMD platforms. This role focuses on end-to-end AI training performance optimization, using both deep systems expertise and AI-assisted engineering workflows to accelerate analysis, tuning, and performance improvements across the stack. THE PERSON: The ideal candidate is a strong performance engineer with experience optimizing ML training systems in production or research environments. You are comfortable working across multiple layers of the stack-from model workloads and distributed training strategies to runtime, communication, and infrastructure bottlenecks. You know when deep low-level investigation is required, but you are equally effective at improving performance through better measurement, automation, systems design, and cross-functional execution. You are excited about applying AI-assisted engineering and agentic tooling to improve how performance work gets done: speeding up investigation, surfacing optimization opportunities, automating experimentation, and improving engineering productivity. KEY RESPONSIBILITIES: Own end-to-end performance for large-model training workloads on AMD platforms across single-node and multi-node environments. Identify and resolve bottlenecks across model execution, runtime systems, communication, input pipelines, memory behavior, and cluster utilization. Define, measure, and improve key training performance metrics such as throughput, time-to-train, scaling efficiency, utilization, and cost per step. Build repeatable workflows for benchmarking, profiling, regression detection, and performance validation. Use AI and agentic tools to accelerate performance analysis, experiment generation, issue triage, optimization recommendations, and engineering productivity. Partner with framework, compiler, library, and infrastructure teams to drive durable performance improvements across the software stack. Improve distributed training efficiency by optimizing collective communication, overlap strategies, parallelism approaches, and cluster-level execution behavior. Guide performance tuning across frameworks such as PyTorch, JAX, XLA, Triton, and related training software stacks. Develop scalable tooling, dashboards, and automated analysis pipelines to detect regressions and prioritize the highest-impact work. Provide technical leadership, mentor engineers, and help establish best practices for performance engineering across training systems. PREFERRED EXPERIENCE: Strong experience in ML systems, HPC, distributed systems, or software performance engineering, with demonstrated impact on training performance. Experience improving large-scale training workloads in production or research environments. Proficiency in C++ and Python, with the ability to build performance tooling, automation, and benchmarking infrastructure. Solid understanding of GPU-accelerated training systems, distributed training, communication libraries, and framework/runtime interactions. Experience with profiling and performance analysis tools, performance debugging, regression analysis, and experiment design. Familiarity with frameworks and ecosystems such as PyTorch, JAX, XLA, Triton, DeepSpeed, FSDP, ZeRO, or similar technologies. Experience optimizing performance across more than one layer of the stack, not only at the kernel level. Hands-on experience with AI-assisted development workflows, automated optimization tooling, or agentic systems that improve engineering efficiency is a plus. Kernel optimization experience is valuable, but this role is broader in scope and centered on overall training performance impact. ACADEMIC CREDENTIALS: Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience. LOCATION: San Jose, CA preferred, may be open to other US locations near AMD offices. #LI-MV1 #LI-HYBRID Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here. This posting is for an existing vacancy.