Skip to main content
Genentech

Machine Learning Scientist

2w

Genentech

New York City, US · Full-time · $141,100 – $147,600

About this role

Advances in AI, data, and computational sciences are transforming drug discovery at Roche’s gRED and pRED. The Computational Sciences Center of Excellence harnesses data and AI to assist scientists in delivering innovative medicines. Join the AI for Drug Discovery group at Genentech to revolutionize drug discovery with machine learning techniques.

In this role on the Foundation Models team within Prescient Design, contribute to internal reasoning Large Language Models for drug discovery tasks like biomolecular design. Work at the intersection of engineering and research, designing and scaling large machine learning systems. Focus on execution of defined projects with robust, performance-critical code.

Collaborate closely with senior scientists and domain experts to implement novel algorithms and translate research into prototypes. Maintain training infrastructure and data pipelines for reliable experiments on clusters. Translate biological and chemical knowledge into machine learning objectives and evaluation criteria.

Develop strategies to improve model performance on scientific tasks, including long-horizon completion and complex reasoning. Design benchmarks and curate high-quality data for biological research. Advance seamless data sharing across gRED and pRED to maximize AI opportunities in R&D.

Requirements

  • BS/MS in Computer Science, Statistics, Mathematics, Physics, or a related quantitative field with 2+ years of relevant work experience, or PhD with 0-2 years relevant work experience
  • Experience developing and training large-scale machine learning models, including post-training techniques to enhance domain knowledge, reasoning capabilities, and model alignment
  • Strong history of research excellence at top-tier venues (e.g., NeurIPS, ICLR, ICML)
  • Experience with large language models (LLMs) for scientific reasoning tasks
  • Proficiency in writing robust, performance-critical code for machine learning systems
  • Knowledge of distributed computing and scalable ML infrastructure
  • Ability to work with biological and chemical domain data

Responsibilities

  • Design, implement, and improve large-scale distributed machine learning systems
  • Develop and execute strategies to improve performance on scientific tasks including long-horizon task completion and complex reasoning
  • Translate biological and chemical domain knowledge into machine learning objectives, training signals, and evaluation criteria
  • Design and implement evaluation methodologies to assess model capabilities relevant to biological research
  • Collaborate with researchers to translate ideas and prototypes into scalable, production-ready systems
  • Write clean, efficient code to test hypotheses regarding reasoning and alignment
  • Contribute to maintenance of training infrastructure and data pipelines
  • Implement novel algorithms from research papers into working prototypes