New

Senior Data Engineer - CC Data and Solutions Engineering

Genentech
United States, California, South San Francisco
Nov 28, 2024
The Position We advance science so that we all have more time with the people we love. At Genentech Research & Early Development (gRED) we have initiated an exciting journey to bring together and further strengthen our computational talent and capabilities by forming a new, central organization - gRED Computational Sciences (gCS). gCS is on a mission to partner across the organization to realize the potential of data, technology and computational approaches that will revolutionize how targets and therapeutics are discovered and developed, ultimately enabling novel treatments for patients across the world. We stand at the beginning of an exciting journey. The Computational Catalysts group within gCS is a diverse, curious and action-driven team at the intersection of computation, engineering and science with ambition to advance our technical excellence. The focus of the team is on partnering with the informatics and scientific communities to create a computational and data ecosystem that powers scientific discovery and accelerates decision making. We aim to modernize our ability to acquire, store, link, share, find and analyze data across the organization through scalable and integrated solutions that truly make every data point count. The Senior Data Engineer will play a key role in executing the strategy for the Data Ecosystem for this newly created group. The Data and Solutions Engineering group within Computational Catalyst is accountable for establishing a common Data Fabric which connects our Systems, specifically our Data Pipelines and Applications for data acquisition, collection, storage, transformation, linkage and sharing. This team strives to build delightful applications and systems for our stakeholders with a strategic mindset. The team is responsible for the end to end product lifecycle management and the work done is leveraged downstream for building key scientific insights and enabling our ML/AI workflows and models. The Senior Data Engineer will closely work with Catalyst colleagues such as Software Engineers, Product andTech/ML Ops as well as directly with our key stakeholders including Computational Scientists, ML Scientists and Research Scientists. The Senior Data Engineer is an experienced and hands-on technical expert with the proven ability to design, implement, and deliver enterprise scale data engineering solutions. You will be responsible for designing and implementing key components of the data fabric that underpins the flow of data within the Computational Catalysts function. You will deeply contribute towards building common data solutions and frameworks which can be leveraged across multiple initiatives. You will be responsible for utilizing modern cloud-based solutions and components to deliver innovative data engineering solutions that help set the standards that will be adopted across Catalysts and the broader organization. You will have deep expertise and hands-on experience of data & software engineering and be familiar with modern and cutting-edge approaches, with experience in managing data flows in high performance computing environments. You will have an understanding of how to build flexible, robust and extensible data pipelines that exemplify industry best practices, minimizing manual interventions, ensuring robustness and scalability, and avoiding technical debt. You will be responsible for designing and building solutions that ensure data align with FAIR Principles and can be collected effectively and flow seamlessly into a variety of different downstream applications, such as large scale machine learning models, including foundational models. You will help break down silos between efforts and foster collaborative efforts to accelerate common solutions. You will have familiarity with a variety of database and data analytics systems. You are passionate about learning in general and newer technologies in specific. Onsite presence, on our South San Francisco campus, is expected for at least 3 days a week. The Opportunity: Build multiple Data Engineering solutions, including API layers, that contribute to the Data Fabric strategy for Catalysts Organization Learn, deeply understand and ultimately improve our Data Ecosystem across structured and unstructured data which powers our systems Contribute to identification and adoption of key trends, technologies, and methodologies by taking an Open Source focussed, Cloud first, API first and AI first approach Ensure that our technical choices and solutions are innovative, best-in-class and integrated by delivering data flows and pipelines within and across gCS, Research Biology, Drug Discovery, Translational Medicine, Development Sciences and beyond Provide key technical decisions around data acquisition, collection, storage, transformation, linkage and sharing while working collaboratively with our key partners Supply to a strong and collaborative community of Data Engineers with a strong focus on mentoring, standardization and best practices (CI/CD, coding standards, code reviews, testing and more) Lead by example to demonstrate the culture and working environment of this new organization aligned with our gCS values: impact, collaboration, diversity, scientific excellence and curiosity Who You Are: Bachelor's or Master's degree in Computer Science or similar technical field and4+ years (3+ with Master's) of experience in software engineering architecting and developing scalable pipelines, frameworks and platforms to power data science efforts in distributed cloud environments in collaboration with data scientists, analysts, and other stakeholders Can provide or learn to provide technical expertise and support for data-related issues, including troubleshooting and resolving data pipeline failures Familiarity with technologies like Snowflake, Databricks and frameworks like Apache Spark, Apache Airflow and strong proficiency in any high level programming languages such as: Python, Java Hands-on experience creating end-to-end pipelines using AWS Glue, AWS Lambda or other similar services to extract, transform, load data and develop serverless data processing workflows and experience building solutions leveraging industry leading OLTP and/or OLAP data systems Familiarity and experience with concepts like: SQL, NoSQL, ETL, ELT, Data Lakes, Event Streaming, Data Fabric/Data Mesh, Elasticsearch, GraphQL, Dev/ML Ops Experience with designing and building Data Products which are highly reliable, scalable, performant, secure and robust and ideally on a public cloud platform Highly collaborative and ability to build trusted partnerships with internal and external stakeholders and the ability to think strategically and optimize for the long term while acting with a sense of urgency Experience adopting and implementing Software Standards and best practices which can be leveraged by broader organization and experience reducing Tech Debt and consolidating and deprecating legacy solutions Not sure you meet all qualifications? Let us decide! Research shows that women and members of other under-represented groups tend to not apply to jobs when they think they may not meet every qualification, when, in fact, they often do! We are committed to creating a diverse and inclusive environment and strongly encourage you to apply. Relocation benefits are available for this job posting. The expected salary range for this position based on the primary location of California is $132,400 - $245,800 of hiring range. Actual pay will be determined based on experience, qualifications, geographic location, and other job-related factors permitted by law. A discretionary annual bonus may be available based on individual and Company performance. This position also qualifies for the benefits detailed at the link provided below. Benefits #LI-JD1 #gCSCatalysts #gCS Genentech is an equal opportunity employer, and we embrace the increasingly diverse world around us. Genentech prohibits unlawful discrimination based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin or ancestry, age, disability, marital status and veteran status.