Posts

2025

Stress-Testing LLMs With Reasoning Gym: Building & Training a Multi-step Reasoning Task

8 minute read

I’ve been exploring how far reinforcement-learning paradigms can push large language models when the reward is verifiable reasoning correctness. That led me to (i) extending Reasoning Gym with a procedurally-generated, multi-hop puzzle set that forces deduction ↔ induction ↔ abduction ↔ transduction hand-offs, (ii) wiring it into the TRL training loop, and (iii) seeing what the first accuracy curves look like. Below is the why, the how, and the initial results.

2025

Peeking Under the Hood of prime-rl

10 minute read

I’d been following the INTELLECT-2 paper and other PrimeIntellect work, but what really piqued my curiosity was PrimeIntellect-ai/prime-rl. The promise was fully asynchronous, file-based RL that scales across decentralized devices. I wanted to understand exactly how it worked: scheduler quirks, memory tricks, the rollout loop, so I asked o3 to be my copilot. In a week-long conversation, we went through each file in the project until a coherent picture emerged. (While at it, I started a fork and added a few small QoL commits of my own → kevinbdsouza/prime-rl.)

2025

Evaluating a Self-Tuning Version of Muon on the NanoGPT Speedrun

14 minute read

For the better part of a decade, Adam has been the default optimizer for training deep learning models. But the ground is shifting. As we scale to massive models, a new family of geometry-aware optimizers, most notably Muon [1, 2], has emerged as a promising contender. The results from the modded-nanogpt [3] speedrun showed that by respecting the unique geometry of neural network layers, we could achieve faster and more efficient training. This is backed by simultaneous and follow up works like Scion [4], Modular Duality [5], Gluon [6], steepest descent under a particular norm and manifold [7, 8], and spectral condition for feature learning [9].

2025

Evaluating DSPy-Based Optimisation on AgentBench

7 minute read

AgentBench’s dbbench-std task evaluates an agent’s ability to answer SQL questions in a multi-hop tool use setting. The controller exposes interaction endpoints, so that every task instance can be completed with a small, repeatable tool repertoire:

2025

Teaching a 1.5-Billion-Parameter LLM to Classify with RLVR and Spatial Heuristics

9 minute read

I wanted to know whether a compact 1.5-B parameter model could learn to be spatial classifier, and this meant probing two things at once:

Expressive power: do today’s distilled language models understand enough geography and have enough spatial awareness to be decision makers?
RLVR: can reinforcement learning from verifiable rewards (RLVR) scale beyond familiar domains?

2025

AI and Labour

4 minute read

The rapid advancement of AI technologies will transform industries and labor markets at an unprecedented pace. Despite these anticipated changes, the relationship between AI and labor remains surprisingly understudied. Recent works, notably by Korinek & Suh (2024), Acemoglu (2025), and Epoch AI's GATE model (2025), illustrate the complexity of AI’s economic impacts, but also highlight significant gaps in understanding AI’s real-world implications for labor.

2025

On Knowledge and Substrate

6 minute read

I’ve recently been thinking a lot about what the intrinsic space of all human knowledge looks like, what kind of topology and structure does the neural latent manifold have, how sparse is it, and how to think about all the space in between pockets of density. For instance, it is not clear to me what the dimensionality of the original space is and whether using tokens as the basic entities of this space even makes sense. Maybe tokens are too granular to be useful for this kind of a thought experiment and we need to think about this at a higher level, say sentences and concepts. The reason such a thought experiment is appealing to me is because I think it lies at the heart of a question I’m interested in - whether AI can discover truly new knowledge.

2024

Enhancing Factual Accuracy in Large Language Models: Integrating Decoding Strategies and Model Steering

19 minute read

Open-source Large Language Models (LLMs) have made advanced conversational AI accessible to a broader audience [1]. Despite their impressive capabilities, these models often grapple with a challenge: factual hallucinations. Factual hallucinations occur when an AI model generates content that is unfaithful to the source material or cannot be verified against reliable data [2]. This issue is particularly concerning in critical and information-dense fields such as health, law, finance, and education, where misinformation can have catastrophic consequences [3][4]. This essay explores the integration of inference-time decoding strategies with model steering as an approach to enhance the factual accuracy of LLMs. By combining these two methods, we can potentially build adaptive systems capable of detecting and mitigating factual hallucinations.

2024

Perspectives on the Future of AI

13 minute read

How big are the models going to get and how much longer is the scaling hypothesis going to hold? It’s unclear, but according to current performance trends, which haven’t shown signs of plateauing (GPT-4o, Claude 3.5 Sonnet, Gemini-1.5-Pro, Llama-3.1-405B, Grok-2), and the power budget of announced data centres (5GW OpenAI/Microsoft Stargate campus), it is likely that there is an order of magnitude left (OOM) to climb in model size. This Epoch AI research covers these scenarios in depth and estimates training runs of the order of ~2e29 FLOPs being possible by 2030, which would be 4 OOMs larger than GPT-4 (2e25 FLOPs). These training runs will primarily be power constrained, followed by chips, data, and latency.

2024

Climate Risks for India in the Coming Decades and the Need to Invest in Adaptation Projects

27 minute read

Climate change has emerged as one of the most pressing challenges of the 21st century, posing unprecedented risks to economies, ecosystems, and human well-being. India, with its diverse geography and significant dependence on climate-sensitive sectors like agriculture, faces heightened vulnerability. Rising temperatures, extreme heat events, changing precipitation patterns, droughts, floods, and coastal hazards are increasingly evident, threatening rural livelihoods and urban infrastructure alike. Although India has been proactive in formulating climate policies—such as the National Action Plan on Climate Change (NAPCC) and State Action Plans on Climate Change (SAPCCs)—and has undertaken mitigation initiatives, the intensifying impacts demand a sharper focus on adaptation. This article reviews India’s key climate risks, summarizes existing adaptation strategies, and discusses the urgent need for scaling up investments in resilience-building measures. It concludes by proposing a strategic path forward to mainstream and finance climate adaptation across sectors.

2024

The Need for a Critical Mineral Demand Model Incorporating Technical Change

17 minute read

Studying the effects of technical change on critical mineral demand and supply in the context of the low-carbon energy transition is an important and open area of research. Despite the crucial role played by these minerals in low-carbon technologies, long-term demand projections remain uncertain due to intricate interactions between drivers of technical change. In this writeup, I lay out what a framework that studies the effects of technical change on critical mineral demand would look like, how it can be developed, and what are its potential use cases.

2022

Developments in Machine Learning for Antibody Design

23 minute read

Protein structure and sequence modeling has seen a fresh wave of resurgence in the last couple of years owing to some interesting developments in machine learning (ML) and deep learning (DL) based techniques. These techniques appear in a variety of flavours including using Equivariant neural network modules to respect the structural properties of 3D macromolecules, deeper networks that can benefit from the increased available experimental structures, powerful node-to-node relationship learners like transformers, and masked language modeling on the protein sequence space to learn evolutionary information. While structure prediction methods like AlphaFold (AF) [1] and RosettaFold (RF) [2] have become ubiquitious in computational structural biology, there remain challenges to be tackled on multiple fronts, where ML will play an important role.

2018

Are We Explorers or Caretakers?

7 minute read

This was written when I was younger, and both the content and the form of my opinions on this topic have changed since then. Leaving this here for the sake of continuity.