Research Engineer · IBM Research

Divya
On data, training, and what makes models work.

I build training infrastructure and data pipelines for large language models at IBM Research. My work spans pretraining, post-training alignment, and agentic systems.

RL Training SFT Pipelines Agentic Environments LLM Architecture

GitHub LinkedIn CV ↗

Recent writing

Writing

Notes on training large language models, from someone who builds the pipelines.

Projects

Research engineering work at IBM — pretraining, post-training, and agentic systems.

IBM Research · Current

Agentic Browser Pipeline

Agentic RL rollouts and trajectory generation at scale across 23 real-world environments (Gmail, Slack, Notion, etc.), 120+ tasks each. Adapted WebArena-Infinity with a custom agent framework on open-source models.

Python PyTorch WebArena FSDP

IBM Research · Open source

Bamba / Granite 4

Pretrained a novel hybrid Mamba2-Attention architecture from scratch on 192 A100 GPUs across ~2.2T tokens. Outperforms Llama 3.1 8B on L1/L2 benchmarks despite 7x fewer training tokens. Co-authored the public HuggingFace technical blog.

HuggingFace Blog ↗ GitHub ↗

Python PyTorch Mamba2 192× A100

IBM Research

70B Post-Training Pipeline

End-to-end post-training for a 70B+ Llama model on 512–768 H100 GPUs. 4-round iterative SFT with think/no-think training, followed by GRPO-based RL across three progressive context phases (8K→32K→64K tokens).

Python GRPO FSDP 768× H100

About

I'm a Research Engineer at IBM Research, where I build training infrastructure for large language models. My work spans pretraining, post-training alignment, and agentic systems.

Currently I'm building an agentic browser pipeline for RL rollout generation across 23 real-world environments (Gmail, Slack, Notion, etc.) — enabling trajectory generation at scale for RL training.

Previously I contributed to Bamba — a hybrid Mamba2-Attention architecture pretrained on ~2.2T tokens, adopted as IBM's Granite 4 flagship — and led the post-training pipeline for a 70B+ reasoning model, covering 4-round iterative SFT and GRPO-based RL across 8K→32K→64K context phases.

Experience

2024–present

Research Engineer

IBM Research · NY

2023

Graduate Research Assistant

University of Pennsylvania

2023

ML Research Assistant

Boston University

2020–2022

Software Engineer

Cisco Systems · Bangalore

Education

2022–2024

M.S.E. Computer and Information Science

University of Pennsylvania

2016–2020

B.S.E. Computer Engineering

Thapar Institute of Engineering and Technology

Contact

Reachable via LinkedIn or email. Open to conversations about LLM training infrastructure, agentic systems, and research roles.

DivyaOn data, training, and what makes models work.