Principal Software Engineer, AI Data Platform (CoreAI)
![]() | |
![]() United States, Washington, Redmond | |
![]() | |
OverviewJoin Microsoft's CoreAI team to build the AI Data Platform, the foundation for secure, scalable, reusable datasets that power model development. We seek Software Engineers passionate about large-scale data infrastructure, automation tools, and intelligence services to transform how Microsoft collects, generates, manages, and shares AI training data. The AI Data Platform team's mission is to build a central AI data platform that breaks down Microsoft's data silos and manages the full lifecycle of first-party, third-party, synthetic, and human-labeled data, accelerating AI model development with secure, reusable, and compliant datasets. The AI Data Platform team is responsible for large-scale data infrastructure, automation tools, and intelligence services to transform how Microsoft collects, generates, manages, and shares AI training data.
ResponsibilitiesResponsibilities:Lead the design and development of large-scale data infrastructure and intelligent services that transform how Microsoft collects, manages, and shares data for AI. Shape the platform's technical direction, drive architectural decisions, and partner with leaders across Microsoft to maximize impact. Own the technical vision and architecturefor scalable, secure, and reusable AI data infrastructure. Design and lead development of intelligent agent-driven servicesto automate the dataset lifecycle (ingestion, registration, validation, PII handling, discovery, governance, lineage, feedback). Build user-facing tools and APIsthat make datasets easily discoverable and reusable across Microsoft's AI teams. Ensure platform security, compliance, and operational excellence, including entitlement management, capacity planning, and escalation support. Collaborate with applied scientiststo integrate ML-driven methods (synthetic data, evaluation, human-in-the-loop workflows) into production pipelines. Mentor engineers, setting best practices in distributed systems, reliability, and large-scale data platform development. Influence cross-org strategyby aligning platform priorities with AI training and product teams to deliver company-wide impact. |