We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

MLOps Engineer

Lenovo
United States, North Carolina, Morrisville
Feb 17, 2026


General Information
Req #
WD00095237
Career area:
Information Technology
Country/Region:
United States of America
State:
North Carolina
City:
Morrisville
Date:
Tuesday, February 17, 2026
Working time:
Full-time
Additional Locations:
* United States of America - North Carolina - Morrisville

Why Work at Lenovo
We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Description and Requirements

Job Summary:

As a MLOps Engineer, you will design, build, and operate the MLOps Control Plane and supporting systems that enable automated, production-grade ML workflows tightly integrated with our GPU-centric infrastructure. You will own end-to-end pipelines for model registry, adapter standardization, automated distillation/retraining, data lineage, drift detection/triggers, multi-adapter model serving, and dynamic routing-while ensuring deep awareness of underlying hardware (dynamic GPU partitioning, time-slicing, resource observability, and high-speed interconnects). This role demands strong software engineering, DevOps discipline, and a passion for scaling AI systems reliably at cluster scale.

Key Responsibilities:

  • Architect and implement the MLOps Control Plane, including model registry, versioning, promotion, and governance features.
  • Develop and maintain the Data Adapter SDK for standardized data ingestion across diverse sources, ensuring 100% adoption and versioning.
  • Build automated CI/CD pipelines for model distillation, retraining (triggered by drift/concept shift), and multi-adapter deployment.
  • Implement data lineage tracking, automated drift detection, and real-time triggers for retraining or routing changes.
  • Design multi-adapter serving infrastructure with dynamic model routing, supporting heterogeneous models and hardware-aware inference.
  • Integrate MLOps workflows with GPU infrastructure features: in-place container resizing, GPU memory observability, dynamic partitioning/time-slicing, failure analysis, and high-performance networking (RoCE/InfiniBand tuning awareness).
  • Own production observability, alerting, and automated root-cause analysis for ML pipelines and GPU workloads to meet <10 min failure RCA and >98% success rate targets.
  • Collaborate across pillars to ensure MLOps systems leverage hardware/software infra improvements for efficiency gains (e.g., 5%+ reduction in training step time).
  • Drive agility goals: enable <1 minute workspace/job resource resizing and fully automated pipelines.

Required Qualifications:

  • Bachelor's or Master's in Computer Science, Engineering, or related field (or equivalent experience).
  • 5+ years in production MLOps, ML infrastructure, or large-scale DevOps roles, with proven impact on distributed AI systems.
  • Deep experience operating ML at cluster scale in GPU-heavy environments.
  • Strong software engineering skills with a focus on reliability, automation, and observability.
  • Ability to collaborate across hardware, software infra, and data science team.
  • 4+ years building and operating production ML pipelines in cloud or on-prem GPU clusters.
  • Hands-on work with large-scale distributed training/inference (e.g., handling multi-node GPU workloads, interconnect-aware optimizations).
  • Experience implementing model registries, automated retraining triggers, and serving systems at scale.
  • Expert Python or similar for infrastructure tooling skills.
  • MLOps platforms/tools: MLflow (or equivalent registry), Kubeflow/Prefect/Airflow, Evidently/Alibi for drift/lineage.
  • Model serving: KServe, Seldon, Triton, or custom multi-adapter frameworks with dynamic routing.
  • CI/CD for ML: GitHub Actions/GitLab CI, automated distillation/retraining pipelines.
  • Observability: Prometheus/Grafana, custom metrics for GPU/memory/network, automated failure analysis.
  • IaC: Terraform/Ansible, plus awareness of GPU-specific configs (drivers, partitioning).
  • Understanding of high-performance networking (RoCEv2, InfiniBand) and its impact on distributed ML.

Preferred Requirements:

  • Experience with GPU virtualization/sharing (MIG, time-slicing, dynamic partitioning).
  • Built or extended SDKs for data/model adapters in large orgs.
  • Implemented data lineage (e.g., OpenLineage) and automated drift triggers at production scale.
  • Familiarity with NVIDIA ecosystem (drivers, CUDA, NCCL, GPU operators).
  • Contributions to open-source MLOps/GPU infra projects.
  • Cloud certifications or experience with hybrid/on-prem GPU clusters.
  • Kubernetes (advanced): operators, custom resources, dynamic resource management, GPU scheduling (time-slicing/MIG awareness).


#LATC

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.
Additional Locations:
* United States of America - North Carolina - Morrisville
* United States of America
* United States of America - North Carolina
* United States of America - North Carolina - Morrisville

Applied = 0

(web-54bd5f4dd9-cz9jf)