The Physical World is the Hidden Compute

16 minute read

Published:

Fairly speculative and written mostly as a way of organizing intuitions which have been bouncing around my head for a while, although I think the core point is probably right in the boring sense that every scientist already knows it implicitly.

I have recently been thinking a lot about the future of AI and scientific discovery through the lens of substrate, by which I mean the literal physical medium in which cognition and discovery take place. This sounds somewhat grandiose, and perhaps it is, but I think it points at a real confusion in how we talk about AI scientists, because a lot of the discussion implicitly treats scientific discovery as if it were primarily an exercise in manipulating propositions, while the actual history of science looks much more like a sequence of increasingly clever ways to couple human cognition to the physical world.

The obvious way to say this is that science is empirical, although that formulation feels too flat to carry the point. The deeper issue is that the world itself is doing an enormous amount of computation for us all the time. A cell folds proteins, a forest integrates climate and soil over decades, a reaction vessel explores a tiny region of chemical dynamics, a telescope converts ancient photons into a measurement, and a wet-lab assay runs a little physically instantiated program whose output we then compress into numbers.

In this sense, a human scientist is not simply a brain which reasons about the world from the outside, since the human scientist is a sensorimotor system embedded inside the world and equipped with instruments, social institutions, tacit craft knowledge, and a long chain of other humans who have already paid much of the cost of coupling thought to reality. The strange thing about modern AI is that it has become extremely strong at manipulating the compressed residue of all this coupling, especially text, code, equations, figures, and data tables, while remaining comparatively weak at building the loops which generated that residue in the first place.

This is why the phrase “AI scientist” can be slightly misleading, because there is a version of the AI scientist which searches literature, proposes hypotheses, runs code, and iterates inside a computational sandbox, and there is another version which actually earns new information from the world. The first version is already becoming real in domains where evaluation is cheap and formal, while the second version requires a much deeper integration of memory, instruments, robotics, simulators, and active learning.

There is an enormous difference between a system that can compute a Turing-computable function in principle and a system that can discover something important in the finite physical world with finite time, finite energy, and finite access to clean experimental feedback. The Turing framing gives us a very general language for what kinds of functions can be computed, and it is natural to ask whether future AI systems with persistent memory and good orchestration will eventually move from something closer to finite-state behavior into the regime of general computation. However, scientific discovery usually lives inside a harsher regime where the question is not merely whether the desired mapping is computable in principle, but whether the system can acquire enough bits from the relevant part of the world in a finite budget.

This is where memory architectures like Titans are interesting, even if one should be careful about over-updating from any single paper or architecture, because they point toward the fact that today’s context-window-based systems have a strangely shallow relationship to time. A person doing science accumulates a research taste over years, remembers the vague smell of failed approaches, carries around half-formed intuitions from old conversations, and gradually develops a compressed internal model of what the field is likely to reward or punish. A language model can appear to have this kind of continuity for a few minutes, but without an ability to update from consequences, it is closer to a brilliant visitor who keeps waking up in a new hotel room with a dossier slid under the door.

Of course, this can change, and probably will change quickly, because long-context attention, retrieval, neural memory, agent workspaces, tool logs, code repositories, and external knowledge stores are all attempts to give models a better temporal substrate. The question I keep returning to is whether this kind of memory merely makes AI systems better research assistants, or whether it eventually crosses a threshold where the system can maintain a scientific life of its own, in the sense of having accumulated taste, self-correction from experimental feedback, and a stable enough identity over time to notice that a line of work has become promising.

There is a related theoretical temptation to ask whether Transformers are Turing complete, and the strongest theoretical results often depend on idealized assumptions about precision, scaling families, or context management. A real deployed AI system is always a fixed physical object embedded in a particular inference loop, with a context manager, memory policy, tool interface, and budget. The abstract computational power of the architecture is important, although the effective computational power of the whole system is hiding in the boring engineering details around the model.

The same point applies even more strongly once we leave text and code and ask about the physical world. If an AI system needs to understand a turbulent atmosphere or a catalytic reaction, it can either simulate some approximation of the system, learn a surrogate from existing data, or interact with the real process through instruments and experiments. Each route has a different price, and the price is paid in resolution, time, energy, sample complexity, compute, money, and the opportunity cost of exploring one region of the world while leaving another region untouched.

This is what I call implicit computation. A physical system evolves under its own dynamics whether or not we understand it, and an experiment is often a way of asking the universe to perform a computation on our behalf. When a human chemist runs a reaction, they are not simulating all electronic structure and solvent effects, rather they are arranging a physical situation whose evolution contains the answer in a form they can partially observe. When a biologist grows cells under a perturbation, the cells perform the computation, and the scientist builds a measurement channel around that computation so that a tiny digestible piece of it enters the human knowledge system. An AI system which lacks high-bandwidth access to this implicit computation has to pay for substitutes, and those substitutes can be brutally expensive. To simulate a physical system at useful fidelity, the model must choose a discretization, track a state over space and time, obey stability constraints, update an enormous number of interacting degrees of freedom, and then somehow compress the result into a usable hypothesis. Even when the simulation is cheaper than reality, it inherits the assumptions of the simulator, and those assumptions can become the boundary of the discovery process.

This is why I think the next era of AI for science will split into two different tracks. The first track will be computationally native science, where the world of interest is already formalized enough that ideas can be generated, tested, and selected inside an automated loop. Machine learning research, algorithm discovery, theorem proving, coding, some parts of mathematics, some parts of chip design, and some parts of computational infrastructure sit closer to this regime, which is why systems like The AI Scientist and AlphaEvolve feel like early signals of a phase transition.

The second track will be physically grounded science, where the bottleneck is earning the right data from the world. Biology, materials, climate, agriculture, medicine, robotics, and much of chemistry live closer to this regime, because the real object of study contains causal structure which has only been thinly sampled by human datasets. In these domains, the future AI scientist probably looks more like an orchestration layer wrapped around simulators, robots, instruments, databases, active learning loops, and human experts. This is also why self-driving laboratories seem much more important than they are usually given credit for in mainstream AI discourse. A self-driving lab is a proposed sensory and motor system for an AI scientist. It gives the model a way to choose actions, observe consequences, update beliefs, and continue the cycle across many rounds, which means it starts to close the loop between explicit cognition and implicit physical computation. The primitive versions will be narrow and brittle, but the direction seems clear once one accepts that discovery requires contact with reality.

The same idea also changes how one should interpret recent AI co-scientist systems. A multi-agent system which generates and refines hypotheses is useful, especially in literature-heavy domains where humans are drowning in papers and cannot keep all relevant mechanisms in working memory. Yet hypothesis generation by itself is the airy part of science. The harder part is often deciding which hypothesis deserves a scarce experiment, what measurement would actually distinguish it from its rivals, and how to update when the result comes back ambiguous or ugly.

One reason computational domains have moved so quickly is that they have excellent verifiers. Code either runs or fails in relatively legible ways and a theorem proof can be checked. This does not make these domains easy, because the search spaces are still vast and deceptive, but it means that the loop from proposal to feedback can be made fast enough for evolutionary search, reinforcement learning, and agentic tree search to be useful. The moment the verifier becomes slow, expensive, noisy, or socially mediated, the whole dream of cheap autonomous discovery becomes more subtle. This suggests that the real unit of progress in AI science may be the feedback loop rather than the model. A frontier model with poor experimental access is a powerful imagination machine trapped behind glass. The history of science is full of this pattern, where new instruments and new measurement channels suddenly make old questions tractable, because the bottleneck was the ability to create a situation where reality answers in a compressed and repeatable form.

There is a version of the future where AI systems become extremely good at science by building increasingly elaborate substitutes for human embedding. They will maintain persistent research memories, construct world models across modalities, design experiments using active learning, run simulations to prune the search space, send only the most informative candidates to robotic labs, interpret the results with mechanistic priors, and then iterate until the system has produced knowledge which no human could have generated unaided. There is another version of the future where we overestimate text-native reasoning and flood the scientific world with plausible papers and shallow hypotheses whose main achievement is to increase the entropy of the literature. This is already a real danger, because the marginal cost of producing something that looks like research is collapsing faster than the marginal cost of producing something that changes what we know. The scientific community already has a hard enough time distinguishing signal from noise, and AI-generated research slop could make the epistemic commons much worse unless evaluation, replication, and experimental grounding improve alongside generation.

The substrate question also reframes robotics and embodiment. It is tempting to look at Boston Dynamics robots, Unitree humanoids, and autonomous vehicles, and say that machines are already learning to navigate the physical world, which is true in the same rough sense that LLMs already navigate the textual world. Yet physical intelligence is constrained by channel capacity, latency, actuator limits, sensor noise, and adversarial edge cases. A robot cannot see everything, cannot sample every possible action, and cannot escape the cost of making mistakes in the world. This does not imply a mystical human advantage, since humans are also bandwidth-limited, metabolically constrained, and mostly terrible at high-dimensional inference. The human advantage, where it exists, comes from being trained by continuous contact with physics from birth, having bodies whose priors were shaped by evolution, and living inside cultures that have already amortized enormous amounts of world-learning across generations. AI systems may eventually surpass this by scaling experience through simulation, teleoperation, and robotic data factories, although the route to doing so runs through the substrate.

Scientific discovery has the same structure. A human scientist does not know the world by containing a perfect model of it, but by living inside a civilization that has built countless partial interfaces to it. The AI scientist will need its own interfaces, and the quality of those interfaces will determine what kinds of knowledge it can create. Existing knowledge is not uniformly distributed across idea space, because it is clustered around the places where humans had instruments, incentives, mathematical formalisms, and tractable experimental loops. The empty regions between these clusters correspond to regions where the world has been too expensive, too slow, or too complex for us to interrogate. An AI trained on the textual residue of science will naturally become powerful inside the dense regions of this manifold, while its ability to jump into sparse regions depends on whether it can buy new information from the world.

Anyone trying to build AI scientists should care more about building the full discovery stack. The stack needs persistent memory, because science is cumulative across time. It needs uncertainty-aware world models, because overconfident models will spend experiments badly. It needs active learning, because experiments are expensive. It needs simulators, because reality is slow. It needs reality, because simulators lie. It needs human experts, because tacit knowledge remains a massive compression of previous failures. It needs institutions, because discovery becomes knowledge only when other agents can inspect, contest, and reuse it.

My current guess is that the most important AI science systems will be hybrids for a long time. They will contain language models, search, retrieval, code agents, Bayesian optimization, symbolic tools, simulation engines, robotic labs, human review, and domain-specific data infrastructure. Over time the AI parts will eat more of the loop, first by automating literature synthesis and experimental planning, then by running computational experiments, then by controlling robotic platforms, then by proposing whole research programs whose logic humans can track at a higher level of abstraction.

The really interesting possibility is that AI may eventually discover knowledge by exploiting substrate combinations that humans rarely use. It may run millions of cheap simulations to shape a prior, execute thousands of robotic experiments to correct it, mine decades of literature to avoid old traps, evolve symbolic hypotheses to preserve extrapolation, and use human experts mainly as sparse high-value evaluators. This would be a new kind of scientific instrument, perhaps closer to the telescope or microscope than to the lone genius, except that the instrument would also choose where to point itself.

I do not think the question is whether machines can “understand” science in the way humans do, because that question tends to dissolve into arguments about words. The better question is whether machine systems can create reliable new constraints on our beliefs about the world. If they can propose a molecule that works or design an algorithm that improves a real system, then some important kind of scientific understanding has entered the world through them. The caveat is that all of this depends on the substrate. A model trapped in text can rearrange the fossil record of past human contact with reality, and this is already enormously useful because the fossil record is huge. A model connected to tools can act on formal worlds. A model connected to laboratories can ask reality new questions. A model connected to institutions can turn answers into shared knowledge.

The future of AI and scientific discovery will therefore be decided by how well we engineer these couplings, because the world is the substrate which computes itself, and science is the art of building channels through which that computation becomes knowledge.