I work on reward modeling, RLHF/DPO, and evaluation for agentic LLM systems. I developed Embedding-to-Prefix and contribute to the systems behind AI Playlist and AI DJ. My PhD at Harvard focused on dialogue systems.
I developed a hybrid approach combining reward models and Direct Preference Optimization (DPO) for tool orchestration in LLM-based agentic systems. The method achieved a 70% reduction in erroneous tool calls and a 4% lift in listening time.
[Spotify Research, 2025]
Featured in Spotify Q2 2025 earnings call as core AI strategy · summary · audio (10:50)
I developed a novel architecture that enables deep personalization of large language models using pre-computed user embeddings. This method bridges representation learning and generative AI, allowing foundation models to be steered by rich user context without costly fine-tuning. The approach achieves strong personalization while maintaining computational efficiency at scale.
[NeurIPS CCFM, 2025]
I built a multimodal dialogue system that generates contextually appropriate responses by jointly processing text and visual information. This work established foundational methods for incorporating visual sentiment and scene understanding into conversational AI, demonstrating how computational systems can respond to nuanced, multimodal human inputs.
[CHI, 2018]