If a bee searching for nectar might be conscious and ChatGPT almost certainly isn't, what exactly were we measuring when we decided something was aware?

For most of the history of thinking about animal minds, behaviour was the only evidence on offer. You could not look inside a crab and ask whether it hurt. What you could do was watch it guard an injured claw, and reason from that observation toward an inference about its inner life. Philosophers and scientists both understood this was imperfect. But it was the best available instrument, and for a long time the limitations didn’t bite hard enough to matter.

Then came the large language models.

ChatGPT, when asked about its inner experience, will describe something that reads like genuine reflection. It hedges, contextualises, and qualifies. If philosopher Susan Schneider’s proposal that meaningful conversation about consciousness might itself be an indicator of consciousness were correct, these systems would be scoring well on the test. The trouble is that almost nobody working seriously on the question believes they are conscious. The behaviour is there. The evidence for consciousness, it turns out, is not.

The test that failed

Schneider’s conversational criterion was never a formal scientific standard, but it reflected a broader intuition that had quietly underpinned thinking about mind for decades: that sufficiently sophisticated behaviour was a reliable proxy for inner experience. This intuition shaped how we thought about animal welfare, about philosophical zombies, and about the prospects for machine minds. It felt reasonable because, in the cases where we were most certain about consciousness, the two things — sophisticated behaviour and inner experience — did tend to appear together.

Language models have severed that pairing in a particularly clear way. A system trained on human-generated text about consciousness will produce human-sounding text about consciousness. The output is a function of the training distribution, not of any underlying experiential state. Behaviour, in this case, is entirely explicable without positing awareness. And once you have a demonstrated case in which the behaviour is present and the awareness is not, the logic runs in both directions: you can no longer use the behaviour alone as reliable evidence of the awareness.

A paper published in Trends in Cognitive Sciences this year by Patrick Butlin, Colin Klein, and nineteen colleagues addresses this directly. Rather than asking what conscious systems do, it asks what conscious systems are built like. The approach draws on multiple theoretical frameworks simultaneously, identifying structural and computational features that any plausible theory of consciousness would require: things like the ability to resolve trade-offs between competing goals in contextually appropriate ways, and the presence of informational feedback loops that integrate signals across different processing streams. The aim is to find indicators that hold across theories, not just within one.

The verdict on current AI systems is unambiguous. No existing system, including the most capable large language models, possesses the relevant architecture. The appearance of consciousness in these systems, the paper states, is not achieved in a way “sufficiently similar to us to warrant attribution of conscious states.” Crucially, this is not a permanent verdict: the paper leaves open the possibility that a differently architected system could, in principle, cross the relevant thresholds. But the systems available now do not.

Architecture over output

The shift from behavioural to structural criteria is not purely a response to AI. It has been building in the philosophy and neuroscience of consciousness for some time, and the logic runs the other way as well: if architecture matters more than output, then systems with the right architecture might be conscious even when their behaviour gives little away.

This is where insects enter the picture. A second paper, by Colin Klein and Andrew Barron published in the Philosophical Transactions of the Royal Society B, proposes a neural model for what the authors call “minimal consciousness” in insects. The model is explicitly abstract: it sets aside the anatomical differences between a bee’s brain and a human’s, and focuses instead on core computations. The key computation the authors identify is one that addresses an ancient problem: how does a mobile organism with many senses and conflicting needs generate coherent, contextually appropriate behaviour?

The answer, in both vertebrate and invertebrate nervous systems, involves a particular kind of integrative processing, one that weighs competing signals and produces a unified representation of the animal’s current situation. Klein and Barron argue that this computation, not any specific anatomy, is what underlies minimal consciousness. If the computation is present, the basis for consciousness may be present. If it isn’t, no amount of complex behaviour should be taken as indicating otherwise.

The paper does not assert that bees are conscious. It proposes a framework within which that question becomes empirically tractable. The distinction matters. What was previously a vague intuition — that insects might have some form of inner experience — becomes, under this model, a testable hypothesis about information processing architecture. The New York Declaration on Animal Consciousness, signed in April 2024 by more than 500 scientists and philosophers, had already staked out the position that consciousness is “realistically possible” in all vertebrates and many invertebrates, including insects. Klein and Barron’s model offers a mechanism that could explain why.

The bee problem, stated plainly

Consider the asymmetry this creates. A honeybee foraging for nectar shows behaviour that, on structural grounds, may have more in common with human awareness than ChatGPT generating a paragraph about awareness. It follows from taking the mechanistic approach seriously. The bee’s nervous system runs computations that the Klein-Barron model identifies as candidate markers of consciousness. The language model’s architecture, however sophisticated, runs a different kind of computation entirely, one that the Butlin-Klein paper finds no relevant indicators in.

The Klein and Barron model achieves something analytically useful: it creates what the authors describe as a “level playing field” for comparing humans, invertebrates, and computers. You are no longer comparing organisms by their evolutionary lineage or the surface complexity of their behaviour. You are asking a specific question about what their nervous systems, or their information-processing systems, actually do.

This is a significant methodological step. One of the persistent difficulties in consciousness research has been the absence of a common currency: a way to place organisms and systems on the same scale. Behaviour doesn’t provide it, because behaviour is too easily produced by processes that have nothing to do with experience. Anatomy doesn’t provide it cleanly either, because anatomical homology across distant species is weak. What the mechanistic approach offers is a description at the level of computation, which is in principle substrate-independent.

What both fields are now saying

The convergence between the two research programmes is not coincidental. Both the Trends in Cognitive Sciences paper and the Philosophical Transactions paper emerge from overlapping groups of researchers working on what consciousness is rather than what it looks like. Both arrive at the same methodological conclusion: when making judgements about whether something is conscious, how it works is proving more informative than what it does.

For animal welfare, the implication is a more defensible basis for moral concern about invertebrates. The Cambridge Declaration on Consciousness in 2012 had already extended the attribution of consciousness to non-human animals with neurological substrates; but even that declaration rested substantially on structural similarity to the human nervous system. The Klein-Barron model pushes the analysis further, asking not whether a brain looks like ours, but whether it runs the relevant computations. That is a more principled criterion, and it may extend moral consideration further than comparative anatomy alone would take us.

For AI, the implication is a form of conceptual discipline. The tendency to read emotional or experiential states into language model outputs is not irrational — it is the same behavioural inference that worked, imperfectly but often enough, across most of natural history. The failure is not in the people making the inference; it is in the adequacy of the instrument. The Butlin-Klein paper offers a different instrument, one whose outputs are less susceptible to being mimicked by systems with no relevant architecture.

None of this resolves the hard problem. The question of why any physical system should give rise to subjective experience at all remains as open as it was before either paper was written. What has changed is the ability to ask the prior question, the one that has to be answered before the hard problem can even be properly posed: which systems are candidates for having an inner life in the first place?

The answer, as of mid-2026, appears to be: fewer systems than we assumed when we were looking at outputs, and possibly more systems than we assumed when we were looking at anatomy.