Research Engineer · IBM Research

Divya
On data, training, and what makes models work.

I build training infrastructure and data pipelines for large language models at IBM Research. My work spans pretraining, post-training alignment, and agentic systems.

RL Training SFT Pipelines Agentic Environments Hybrid Architectures
[ photo ]
IBM Research · NY
Bamba · Granite 4 flagship
70B+ post-training pipeline

Writing

Notes on training large language models, from someone who builds the pipelines.

Projects

Research engineering work at IBM — pretraining, post-training, and agentic systems.

IBM Research · Current
Agentic Browser Pipeline
Agentic RL rollouts and trajectory generation at scale across 23 real-world environments (Gmail, Slack, Notion, etc.), 120+ tasks each. Adapted WebArena-Infinity with a custom agent framework on open-source models.
Python PyTorch WebArena FSDP
IBM Research · Open source
Bamba / Granite 4
Pretrained a novel hybrid Mamba2-Attention architecture from scratch on 192 A100 GPUs across ~2.2T tokens. Outperforms Llama 3.1 8B on L1/L2 benchmarks despite 7x fewer training tokens. Co-authored the public HuggingFace technical blog.
Python PyTorch Mamba2 192× A100
IBM Research
70B Post-Training Pipeline
End-to-end post-training for a 70B+ Llama model on 512–768 H100 GPUs. 4-round iterative SFT with think/no-think training, followed by GRPO-based RL across three progressive context phases (8K→32K→64K tokens).
Python GRPO FSDP 768× H100

About

I'm a Research Engineer at IBM Research, where I build and train large reasoning models. My work spans pretraining, post-training alignment, and agentic systems.

Currently I'm building an agentic browser pipeline for RL rollout generation across multiple real-world environments (Gmail, Slack, Notion, etc.) — enabling trajectory generation at scale for RL training.

Previously I contributed to Bamba — a hybrid Mamba2-Attention architecture pretrained on ~2.2T tokens, adopted as IBM's Granite 4 flagship, and led the post-training pipeline for a 70B+ reasoning model, covering multi-round iterative SFT and GRPO-based RL across increasing context phases.

Experience

2024–present
Research Engineer
IBM Research · NY
2023
Graduate Research Assistant
University of Pennsylvania
2023
ML Research Assistant
Boston University
2020–2022
Software Engineer
Cisco Systems · Bangalore

Education

2022–2024
M.S.E. Computer and Information Science
University of Pennsylvania
2016–2020
B.S.E. Computer Engineering
Thapar Institute of Engineering and Technology

Contact

Reachable via LinkedIn or email. Open to conversations about LLM training infrastructure, agentic systems, and research roles.