Overview
Join the team that keeps Microsoft 365 running in sovereign cloud environments where reliability, scalability, and security are non-negotiable. You'll work on distributed systems at massive scale, automating operations, building disaster recovery capabilities, and engineering solutions that eliminate toil and improve service delivery. Bring your expertise in large-scale systems and help us set the standard for sovereign cloud reliability. The M365 Sovereign Clouds organization is building the future of secure productivity for the world's most critical customers. As part of Azure Silver and Microsoft Sovereign Clouds, we deliver and operate the full Microsoft 365 suite, including Office 365, Exchange, Outlook, Teams, SharePoint, OneDrive, and Purview within highly regulated sovereign cloud environments. We are a team of innovators and problem-solvers who thrive on transforming complex challenges into reliable, high-performance services that empower sovereign cloud customers. Our culture is rooted in growth mindset, innovation, collaboration, and inclusion, and we believe that diverse perspectives drive our best work.
On the Security & Compliance team, you'll work with other engineers on the systems that protect M365 sovereign cloud customers from phishing, malware, spam, and data governance challenges. These systems process and protect millions of messages and documents daily. Our sub-teams offer exciting opportunities to work on highly complex systems that enable information protection and data governance for our customers.
The right candidate for this job (is):
* Passionate about distributed systems and working with highly scalable services. * Enjoys new technological challenges and is motivated to solve them. * Excited about making better software and continuously improving the development, integration, and deployment processes. * Self-starter who thrives in a bottoms-up, fast-paced, highly technical environment. * Effective collaborator, experienced in creating technical partnerships across teams. * Committed to ensuring exceptional customer satisfaction through technical excellence.
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
* Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting issues, taking appropriate action to mitigate impact, and deploying appropriate fixes to resolve root cause(s). Notifies product teams and owners to major customer impacting issues and escalates resolution of highly impactful issues affecting multiple components or features to other engineers or engineering teams as needed. Communicates details and resolutions through post-mortem reports and review meetings. * Independently writes code or scripts that automate the performance of scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale. * Designs, develops, and maintains telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of product components and features operating at scale. Independently performs analyses using existing tools and/or models to identify insights and shares them with product engineering teams to directly contribute to improvements in product development and/or operations. Monitors the impact of changes on operations metrics (e.g., Time-to-X). * Independently uses existing tools and/or models to troubleshoot problems or flaws affecting the availability, security, reliability, performance, and/or efficiency of components and features, leveraging the artificial intelligence (AI) and machine learning (ML) capabilities. Proposes solutions that will resolve and prevent recurring issues and brings them to the attention of their Site Reliability Engineering (SRE) and/or product engineering teams. * Independently creates, tests, and deploys changes through a safe deployment process (SDP) to enhance code quality and improve the observability, security, reliability and operability of one or more platforms, systems, or products operating at scale. * Shares insights and best practices via documented artifacts that can be applied to improve development and operations of system, platform, or product components and features by participating in code/design reviews, incident drills and debriefs, and regular meetings, as well as interactions with more experienced SREs and members of product engineering teams. * Engages with product engineering teams by participating code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles. Utilizes technical knowledge of systems/platforms and insights drawn from product engineering teams, security best practices, artificial intelligence (AI)/machine learning (ML), and telemetry analyses to suggest potential improvements in code base and designs across components and features of one or more products.
Qualifications
Required Qualifications: * Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python o OR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python o OR equivalent experience. * 2+ years technical experience working with large-scale cloud or distributed systems. Other Requirements Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Candidates must have an active TS/SCI and be willingand eligibleto upgrade to TS/SCI (with polygraph). This role will require candidates tomaintainthe TS/SCI (with polygraph) clearance. Ability to meet Microsoft, customer and/or government security screening requirementsare requiredpre-offer and post-hirefor this role. Failure tomaintainor obtain theappropriate clearanceand/or customer screening requirements may result in employment action up to and including termination.
- Clearance Verification: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
- Microsoft Cloud Background Check:This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled. Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
|