Large Language Models Intern

9w2 months ago

Genentech

New York City, US · Internship · $50 – $50/hr

About this role

Prescient Design, part of Genentech’s Research and Early Development organization, advances drug discovery through cutting-edge machine learning. The Foundation Models team builds internal large language models enabling next-generation scientific and biomedical applications across the drug-discovery pipeline. Exceptional graduate student interns with strong ML research or engineering backgrounds drive independent exploration and solve complex technical problems collaboratively.

Interns contribute to research and development of internal LLMs for scientific discovery and therapeutic molecular design. They develop and evaluate advanced post-training techniques to enhance domain knowledge and strengthen reasoning capabilities for scientific and biomedical applications. Responsibilities include supporting large-scale model training on high-performance GPU clusters.

Collaboration occurs with cross-functional teams to design and implement applied LLM use cases. The internship is on-site in New York City, fostering work with leading experts in biotechnology and AI. Interns tackle high-visibility projects with full ownership in interdisciplinary settings.

This 12-week full-time paid internship offers a location-based stipend and paid holiday time off. Participants pursue impactful work in AI for drug discovery. The program emphasizes hands-on contributions to innovative foundation models.

Requirements

Pursuing a PhD (enrolled student)
Major in Computer Science, Data Science, Machine Learning, Statistics, or a related technical field
Strong Python skills and experience with ML frameworks such as PyTorch
Solid understanding of neural networks, representation learning, and modern supervised/unsupervised methods
Excellent written and verbal communication, and ability to work effectively with interdisciplinary teams
Hands-on experience with large language models, especially post-training workflows (e.g., supervised fine-tuning and reinforcement learning)
Experience with GPU clusters or distributed training systems for efficient large-scale model training

Responsibilities

Contribute to research and development of internal LLMs for scientific discovery and therapeutic molecular design
Develop and evaluate advanced post-training techniques to enhance domain knowledge and strengthen reasoning capabilities for scientific and biomedical applications
Support large-scale model training on high-performance GPU clusters
Collaborate with cross-functional teams to design and implement applied LLM use cases
Drive independent exploration of LLM applications in drug discovery
Solve complex technical problems in collaborative settings