Nikhil Anand

I'm currently a research scientist at the Kempner Institute at Harvard working on machine learning and artificial intelligence. I also have a background in theoretical high-energy physics.

Before joining Kempner, I was at Amazon (in the organization formerly known as Alexa AI) where I worked on production speech/language models as a research scientist. Before that, I was a Simons Collaboration postdoctoral fellow at McGill University, working on topics in high-energy theoretical physics.

I completed my PhD in 2018 at Johns Hopkins University under the supervision of Jared Kaplan. I did my undergrad at UC Berkeley, where I majored in physics and worked on dark matter detection.

Research Interests

My work focuses on building and scaling up foundation models. Some specific research questions I've been thinking about recently are:

Quantization: what precision is needed to (pre/post)-train large models and what is its interaction with the architecture?
Post-training: how can we set up post-training for agentic tasks without clear verifiable rewards or with long horizons?
Scaling systems: what are the systems-related challenges associated with scaling up transformers?

Feel free to reach out if you want to chat about any of these or related topics!

Preprints and Publications

A full list is available on my Google Scholar.

Blog Posts / Teaching

Blog post on our work on low-precision training instabilities
Blog post on our work on loss-to-loss-prediction
(Fall 2024) Co-taught Kempner Institute introductory workshop on building transformers: slides and Jupyter notebook exercises (with solutions).
(Fall 2024) Guest lecture slides for CS 2281R about hardware, DDP, checkpointing, and compute primitives and parallelization.

Fun

also enjoy photography