The Latent Shape of Discovery

21 minute read

Published:

Fairly speculative, probably underformalized in several places, although I think the core question is important enough that it is worth trying to make the shape of it visible before the formalism has fully caught up.

I have recently been thinking a lot about the intrinsic space of human knowledge, by which I do not mean the internet, or the scientific literature, or some embedding database sitting inside a RAG system, but something closer to the compressed structure that would remain if you took everything humans know, stripped away the surface forms, and asked what the actual topology of the thing was.

This is already a slightly dangerous metaphor, because the moment we say “space” we begin smuggling in assumptions about distance, continuity, density, curvature, neighbourhoods, and maybe even dimension, while the actual object we care about might be something stranger than a manifold, closer to a stack of partially aligned representational systems that only locally behave like a geometric object. Still, the metaphor is useful because it forces the right discomfort, since if knowledge has a latent geometry then scientific discovery becomes a question about where we are in that geometry, what regions are populated by existing theories and observations, what regions are sparse yet reachable, what regions are inaccessible without new instruments, and what regions only become meaningful after reality has pushed back through experiment.

One crude version of this story says that language models are trained on a huge projection of human knowledge and therefore learn some compressed representation of this knowledge inside their activations, after which generation becomes a kind of movement through that representational space, producing new strings that are locally plausible because they lie near high-density regions of human thought.

This version is almost certainly too simple, partly because tokens feel like the wrong atoms for thinking about knowledge and partly because the model is doing a non-linear computation over many interacting representational layers, although the basic picture still captures something important about why these systems can recombine ideas in ways that sometimes feel surprisingly intelligent. Maybe the right unit is a sentence, maybe it is a concept, maybe it is a proof fragment, maybe it is an experimental protocol, maybe it is a causal model with some implicit handle on how the world would respond under intervention, and maybe different sciences operate with different units because physics, molecular biology, and ecology carve reality at very different joints.

If we are asking whether AI can discover new knowledge, this choice of unit matters a lot, since “newness” at the level of text is cheap, “newness” at the level of a plausible hypothesis is more interesting, “newness” at the level of a verified causal mechanism is much harder, and “newness” at the level of a new scientific worldview is the kind of thing that only looks inevitable after the fact. A language model can generate a sentence nobody has written before, and almost nobody cares, while a system that proposes a strange drug repurposing hypothesis that survives wet-lab validation has moved a different kind of object through the space, because the new point is now attached to reality by an experimental tether.

This is why I find the interpolation versus extrapolation debate both relevant and somewhat unsatisfying, because the convex hull story is mathematically crisp in one representation, while the thing we actually care about is whether the model has learned transformations that preserve contact with the causal structure of the world as it moves into low-density regions. Yann LeCun and others have argued that in high dimensions nearly everything is extrapolation under the convex hull definition, and this is useful as a warning against lazy intuitions about interpolation, although it also seems too blunt for the question of discovery because a point can be outside the convex hull of training examples while still sitting on a meaningful low-dimensional structure that the model has captured.

The more interesting question is probably local generalization on the right intrinsic representation, because scientific intelligence rarely means leaping into a random void, while it often means following a faint but lawful direction away from what is already known, guided by invariances, analogies, conservation laws, mechanistic constraints, and repeated contact with measurement.

In this sense, the future of AI discovery may depend less on whether models interpolate or extrapolate in some abstract high-dimensional space and more on whether they can learn the right coordinates for a domain, then traverse those coordinates in ways that remain meaningful when the density of human examples gets thin. This is where sparse autoencoders feel relevant, because if large models contain somewhat disentangled features corresponding to concepts, relations, capabilities, and higher-order abstractions, then the latent space of knowledge may become less of a poetic metaphor and more of an object that can be partially mapped, probed, edited, and eventually navigated.

The dream here is seductive. Take a scientific domain, decompose the internal representations of a capable model, identify the regions corresponding to existing facts, methods, theories, and anomalies, then deliberately search the sparse spaces between them for compositions that are surprising, coherent, and experimentally actionable. This would make AI science more like running a search process over a structured epistemic landscape, where novelty is measured by distance from the known literature, plausibility is measured by consistency with existing evidence, and value is measured by the expected information gained from testing the idea.

Something like this is already beginning to happen in a primitive way, although the current systems are still mostly scaffolded cognitive machines rather than autonomous scientists in the romantic sense.

FunSearch is a clean example because it couples a language model to an evaluator, evolves programs, and produces verifiable outputs in mathematics and computer science, which matters because the evaluator gives the search process a hard contact surface with truth. GNoME is another version of this pattern in materials science, where scaled graph networks search huge regions of inorganic crystal space and generate candidates that can later be filtered, validated, and in some cases physically synthesized, which makes the discovery loop feel less like free-form generation and more like the expansion of a map. AlphaFold and AlphaFold 3 show yet another version, where biological structure becomes predictable enough that an enormous region of protein and biomolecular interaction space becomes computationally navigable before every point has been physically measured.

The recent Co-Scientist work pushes the story closer to explicit scientific cognition, because it uses a coalition of agents to generate, critique, rank, and evolve hypotheses, and the important part is the tournament-like process in which test-time compute is spent on improving the quality of ideas. The AI Scientist line of work is even more interesting because it tries to automate the whole loop in machine learning research, from idea generation to experiments to paper writing to peer review, and although the current outputs remain bounded by workshop-level standards and digital-only experimental domains, the direction of travel is obvious enough to make people uncomfortable.

Coscientist, autonomous laboratories, and the broader self-driving lab movement take the next step by giving the system hands, or at least robot-mediated access to instruments, which changes the philosophical situation because the model is no longer only moving through text or code but also participating in the cycle through which the world generates new evidence. This is the crux of the matter for scientific discovery, because the latent space of human knowledge can be filled in by recombination for a surprisingly long time, yet at some point the model has to earn the right to believe its own new concepts by making them collide with reality.

A purely textual AI scientist will be strongest in domains where evaluation is cheap, formal, and fast, which is why mathematics, programming, and certain kinds of computational science are such natural early targets. Biology and chemistry are different because the space of plausible hypotheses is vast, the data are biased by what humans have chosen to measure, and the hidden procedural knowledge of the lab is often absent from papers precisely because scientists treat it as background craft.

This tacit layer matters more than people in AI sometimes appreciate, because real laboratories contain friction at every level, from ambiguous assays to finicky reagents to instruments that behave differently on a humid afternoon, and a model trained mostly on papers sees the polished trace of science rather than the messy process by which science was made. This is also why the recent enthusiasm around autonomous labs is important, since the discovery system has to incorporate failed experiments, calibration details, embodied procedures, and low-status negative results if it is going to learn the true topology of a scientific domain rather than the beautified topology of the published literature.

The published literature is an extremely strange training distribution because it is filtered for significance, novelty, prestige, clarity, and narrative coherence, while the actual process of discovery contains far more failed branches than successful ones. A model trained on the literature therefore inherits a map in which successful paths are overrepresented and dead ends are underrepresented, which creates a subtle danger when the model begins proposing experiments at scale. The danger is that AI may accelerate the existing incentives of science rather than expand science itself, producing more papers, more plausible hypotheses, and more activity in data-rich regions, while leaving the truly weird sparse regions even more neglected.

This is why the recent finding that AI-augmented scientists may publish more and get cited more while the collective focus of science narrows is important, because it suggests that acceleration alone can create an epistemic monoculture if every agent learns to run toward the same well-instrumented gradients. In the language of the latent manifold, AI could fill in the nearby gaps very quickly while making the distant continents less likely to be explored, simply because the nearby gaps have better benchmarks, cheaper validation, and denser reward.

This is one reason I am wary of a simplistic “AI will automate science” frame, even though I am quite bullish on AI accelerating discovery, since automation tends to preserve the structure of the existing workflow while making it faster, whereas genuine discovery often requires changing the representation, changing the measurement apparatus, or changing the question. A very capable AI scientist might therefore look less like a tireless graduate student and more like a system that notices when the coordinate system itself is bad.

It would ask whether a field is using the wrong ontology, whether an experimental proxy has become detached from the phenomenon it was meant to measure, whether a benchmark has become a little religion, whether the literature has overfit to what is easy to publish, and whether a sparse region of the knowledge space is empty because it is unpromising or because nobody has had the tools to survive there.

This is also where the question of “new knowledge” becomes less binary than people want it to be. Non-trivial composition of existing ideas can be new in the same way that a new proof is new, because the components may be old while the path through them was previously invisible. Interpolation on a learned manifold can be new if the manifold captures a real structure and the interpolated point corresponds to a hypothesis that humans had not articulated. Extrapolation can be useless if it points into nonsense, and interpolation can be profound if it reveals a hidden bridge between two dense regions that were historically separated by discipline, notation, or social structure.

The interesting unit is therefore something like verified epistemic expansion, where a model proposes a representation, mechanism, construction, or intervention that changes what competent scientists can reliably do. This definition makes AI discovery continuous with human discovery, because humans also discover by recombination, analogy, error correction, and contact with the world, while our advantage has historically been that we are embodied agents embedded in social institutions with instruments, incentives, memory, and taste.

The missing ingredient in current AI systems may be taste more than intelligence in the narrow sense, where taste means the learned ability to choose problems whose answers would matter, to abandon sterile directions, to distrust elegant nonsense, and to sense that some ugly little anomaly deserves a year of attention. The future AI scientist needs many forms of taste, including mathematical taste for the fruitful lemma, experimental taste for the assay that will actually discriminate between mechanisms, engineering taste for the system that can be built, and sociological taste for the question a field is ready to understand.

This is why I suspect that the first transformative AI-for-science systems will be hybrid institutions rather than single models, with foundation models, domain simulators, robotic labs, human experts, automated verification systems, and scientific communities all coupled together into increasingly tight feedback loops. The model will explore the latent space, the simulator will cheaply reject some ideas, the lab will test the most informative ones, the literature agent will update the map, and the human scientist will still shape the objective because the choice of what counts as progress is itself part of science.

Over time, however, the human role may move upward in abstraction, from doing every calculation and every literature search to defining taste, constraints, values, and research programmes, while the machine population explores many local branches faster than any individual lab could. This transition will feel strange because science has always been partly an identity, a craft, and a status system, and suddenly some of the visible artefacts of scientific labour, especially papers and analyses, will become cheap enough that they lose much of their signalling value.

If AI can generate a workshop paper, then the paper becomes a weaker proof of intellectual labour, and the real object of evaluation shifts toward the originality of the question, the trustworthiness of the loop, the quality of the validation, and the long-term usefulness of the resulting knowledge. This is already happening through the failure mode of hallucinated citations, where language models can produce scholarly-looking references that slip into the scientific record, which is almost a dark parody of AI science because the system fills the manifold with fake points that look locally plausible and globally corrupt the map.

A scientific ecosystem with AI-generated hypotheses, AI-written papers, AI-assisted peer review, and AI-trained future models will need much stronger epistemic immune systems, because a polluted literature becomes training data, polluted training data becomes future generation, and future generation can then launder the pollution back into the literature with higher fluency. The solution is probably boring in the best possible way, which is citation verification, provenance tracking, executable claims, linked datasets, machine-checkable proofs where possible, automated replication where feasible, and a cultural shift in which impressive output matters less than traceable contact with reality.

For AI discovery to work, we need to make the feedback channels from reality much higher bandwidth than the feedback channels from prestige. This is the part that makes me optimistic, because once a domain has a good evaluator, even an imperfect model can become useful through search.

The history of recent AI breakthroughs keeps repeating this pattern, where a generative system becomes powerful when coupled to a verifier, an environment, a simulator, a benchmark, or an experimental loop that converts plausible variation into cumulative progress. In math, the verifier can be a proof checker or an executable program. In materials, the verifier can begin with DFT and end with synthesis. In biology, the verifier might be a perturbation experiment, a binding assay, a cellular phenotype, or eventually a virtual cell accurate enough to make the first pass through hypothesis space much cheaper.

The grand challenge is that many of the most important scientific questions have slow, expensive, and noisy evaluators, which means AI progress in those domains will depend on building better instruments and better intermediate worlds as much as building larger models. This is where world models become central, because a scientific world model is basically a compressed laboratory that lets you ask counterfactual questions before spending scarce physical experiments.

A good world model of proteins, cells, climate, materials, or economies would act as a dense differentiable approximation to parts of reality, giving AI systems a smoother landscape over which to search, while still requiring periodic anchoring to the real world because the approximation will always fail somewhere. The exciting future is therefore one in which science becomes a nested set of loops operating at different speeds, with language models exploring hypotheses in seconds, simulators filtering them in minutes or hours, autonomous labs testing them over days or weeks, and human communities integrating the results over months and years.

At each level, the latent space gets reshaped. The model proposes a new point, the evaluator accepts or rejects it, the lab gives the point empirical weight, the literature absorbs it, the next model trains on a slightly changed world, and the intrinsic space of human-machine knowledge becomes denser in places that were previously unreachable.

This is the strongest version of the argument that AI can discover truly new knowledge, because discovery is no longer a magical property of a single generation but a dynamical property of a closed loop that can move the frontier. The weaker version says that LLMs merely recombine what humans have already written, which is sometimes true for today’s shallow uses and sometimes a category error when the recombination is coupled to verification.

Humans also recombine what we have seen, heard, read, measured, and failed to explain, and the dignity of human discovery comes from the fact that some recombinations survive reality and then change the space of future thought. The open question is whether AI systems can develop a sufficiently rich hierarchy of abstractions to propose recombinations at the right level, because token-level recombination is too shallow and concept-level recombination may still be too coarse.

Perhaps the next step is to give models explicit access to objects at many levels at once, including equations, mechanisms, protocols, code, datasets, causal graphs, physical constraints, and experimental affordances. A scientific AI would then reason across these levels, sometimes translating a conceptual hypothesis into a simulation, sometimes translating a simulation failure into a new measurement, and sometimes translating an experimental anomaly into a change in the ontology.

The deepest discoveries may come from this cross-level movement, because many scientific revolutions happen when a field learns that its objects were drawn at the wrong scale. Genes became regulatory networks, proteins became dynamic ensembles, diseases became heterogeneous mechanisms, ecosystems became coupled adaptive systems, and intelligence itself may become less like a property of isolated minds and more like a property of loops between models, tools, environments, and communities.

If this is right, then the AI scientist of the future will be less a single genius in a box and more a new layer in the civilizational discovery process. It will traverse the latent manifold of existing knowledge, generate possible continuations, test them against increasingly rich surrogate worlds, route the best ones into physical experiments, and update the shared map when reality answers. Some of the answers will be boring, some will be false, some will be dangerous, and some will be strange enough that humans would have taken decades to formulate them.

The question then becomes institutional rather than purely technical. How do we design scientific systems that reward exploration of sparse regions, preserve epistemic diversity, prevent pollution of the literature, keep humans responsible for values and problem choice, and still let machines search at the scale they are clearly going to search?

I do not think the right mental image is a machine replacing the scientist at the bench, because that picture is too small for what is coming. The better image is the slow construction of an expanded scientific sensorium, where humanity learns to perceive regions of possibility that were previously invisible because our brains, institutions, and instruments were too bandwidth-limited to hold them.

In that world, the latent space of human knowledge is no longer just a fossil record of what we have already understood. It becomes a living search surface, continuously reshaped by models, experiments, failures, theories, and new measurements. The models will fill in the nearby voids first, because that is where the gradients are easiest to follow, although the real prize will be learning how to cross the deserts between dense regions without hallucinating an oasis.

That, to me, is the future of AI and scientific discovery. Systems that can explore the shape of possible knowledge at machine speed while remaining tethered to the stubborn, expensive, beautiful fact that reality gets the final vote.