I'm currently a research scientist at the Kempner Institute at Harvard working on machine learning and artificial intelligence. I also have a background in theoretical high-energy physics.
Before joining Kempner, I was at Amazon (in the organization formerly known as Alexa AI) where I worked on production speech/language models as a research scientist. Before that, I was a Simons Collaboration postdoctoral fellow at McGill University, working on topics in high-energy theoretical physics.
I completed my PhD in 2018 at Johns Hopkins University under the supervision of Jared Kaplan. I did my undergrad at UC Berkeley, where I majored in physics and worked on dark matter detection.
Research Interests
My work focuses on building and scaling up foundation models. Some specific research questions I've been thinking about recently are:- Quantization: what precision is needed to (pre/post)-train large models and what is its interaction with the architecture?
- Post-training: how can we set up post-training for agentic tasks without clear verifiable rewards or with long horizons?
- Scaling systems: what are the systems-related challenges associated with scaling up transformers?
Preprints and Publications
A full list is available on my Google Scholar.
Blog Posts / Teaching
- Blog post on our work on low-precision training instabilities
- Blog post on our work on loss-to-loss-prediction
- (Fall 2024) Co-taught Kempner Institute introductory workshop on building transformers: slides and Jupyter notebook exercises (with solutions).
- (Fall 2024) Guest lecture slides for CS 2281R about hardware, DDP, checkpointing, and compute primitives and parallelization.
Fun
-
When I need to step away from the screen, you’ll usually find me at a squash court, taking a long walk, biking, or lifting weights.
I also enjoy photography and usually bring my trusty Nikon when I'm traveling.