What Do We Mean When We Ask “What Is Intelligence?”
A compact map of a slippery concept with high moral stakes
[NOTE: In most AI conversations, “intelligence” is doing too much work while m
eaning too many different things. And this is deeply problematic precisely because claims about intelligence have a long history of being used to naturalize unjust and injurious hierarchies.
Independent of the social uses of the concept, there are also very sound reasons in cognitive science to be skeptical of how easily people reach for “general intelligence,” as if there were a single scalar quantity that minds (or machines) simply have more or less of. To the extent that “more intelligent than” is even a useful comparative, it is not a strict linear order. It is domain-specific, tradeoff-laden, and shot through with questions about robustness, efficiency, and context.
And the issue of whether a comparative is a strict linear order cannot be depoliticized. “More intelligent” talk has been part and parcel of racialized domination and exclusion. More recently it has also fueled a more mundane and increasingly destructive error: what you might call engineering brain, where being excellent at one narrow kind of problem becomes the warrant for confident intervention everywhere else. The myth of intelligence as a single ladder is part of the ideological backing for both: the exclusion and repression of people lower on the economic ladder, and a kind of elite epistemic overreach at the top.
Anyhow, what follows is a compact map of the question: what we might mean by intelligence, why single benchmarks mislead, and why debates often slide between levels of explanation without noticing. I’ve also shared a version of this with LSU Online as part of our effort to build our first free, public-facing introduction to AI, so I’m leaving it in a slightly “course content” register rather than polishing it into pure Substack form.
I think the analytic clarity here is necessary not only for understanding the good, the bad, and the ugly of emerging technologies, but also for understanding and critiquing the contemporary social and political world.]
What Do We Mean When We Ask “What Is Intelligence?”
Asking what something is seems like a straightforward question. But think about a simple example: What is a bicycle? We might be asking what it is for (transportation), how it works (gears, pedals, balance), or what it is made of (metal, rubber, carbon fiber). Each of these questions clarify our initial query in importantly different ways.
Questions about intelligence work in much the same way. When people disagree about what intelligence is, they are often talking past one another because they are answering different questions without realizing it.
While studying vision, cognitive scientist David Marr developed a framework that has since become a foundational tool across cognitive science and artificial intelligence. He theorized that complex systems are understood in three levels of analysis. Disagreements often arise when conclusions drawn at one level are mistakenly taken to apply to the others.
At the computational level, we ask what problem a system is solving and what counts as success. This level characterizes the goal of the system. At the algorithmic level, we ask how the system solves that problem, including the representations it uses and the procedures it applies. At the implementational level, we ask how those procedures are physically realized, whether in biological brains or artificial hardware. Importantly, success at a task does not require that the physical implementation resemble the human brain or nervous system.
The Turing Test and the Limits of Behavioral Benchmarks
One of the most famous attempts to answer the question “What is intelligence?” is the Turing Test, proposed by Alan Turing. Rather than offering a definition of intelligence, Turing suggested a practical benchmark: if a machine can carry on a conversation that is indistinguishable from that of a human, then we should treat it as intelligent. This proposal is compelling in part because in everyday life we routinely assess intelligence by talking with people, by seeing whether they understand questions, respond appropriately, correct themselves, draw distinctions, and adapt to new topics or unexpected turns. We do not typically appeal to hidden inner processes or biological details when making these judgments. If a system can participate in conversation in the same flexible, context-sensitive way, it seems natural to extend the same attribution of intelligence, regardless of the system’s underlying construction.
Seen through Marr’s framework, the Turing Test operates almost entirely at the computational level. It asks whether a system can successfully perform a particular task (sustaining human-like conversation) without stipulating how the system does this or what it is made of. This was a deliberate and methodologically cautious choice. By focusing on observable behavior, Turing avoided speculative debates about consciousness or inner mental states.
For many decades, this benchmark proved extremely demanding. Building systems that could convincingly sustain open-ended conversation turned out to be far more difficult than anticipated. Recent advances in large language models have changed that situation. These systems can often produce fluent, context-sensitive responses that closely resemble human conversation.
However, while LLMs can perform very well on conversational tasks, they also produce confident errors, contradict themselves across similar prompts, and fail to maintain coherence over longer interactions. These limitations reflect the fact that, by itself, the ability to sustain human-like conversation does not capture the full range of problems we associate with intelligence. The computational task identified by the Turing Test turns out to be too narrow to serve as a general indicator for intelligence as such.
The lesson here is not that the Turing Test was misguided. In many ways, it worked exactly as intended. What recent AI systems have shown is that behavioral success on a single task, even an impressively complex one, is not enough to settle broader questions about intelligence.
Intelligence, Goals, and How Systems Fail
Recent debates have raised serious questions about whether “general intelligence” names a single, well-defined capacity that can be straightforwardly measured or tested. Critics such as Timnit Gebru and Émile P. Torres argue that claims about artificial general intelligence are often untestable in an engineering context because meaningful evaluation requires clearly specified goals and success conditions. Without these benchmarks, it becomes unclear what would count as evidence for or against general intelligence. One very common proposal is that artificial general intelligence would be achieved when a system surpasses humans at everything, but this standard is itself incoherent as a measure of intelligence. Human abilities are diverse and often trade off against one another: excellence in some domains comes at the cost of excellence in others. For this reason, defining general intelligence as universal superiority to humans does not provide a clear target for design, measurement, or testing, and instead gives a moving goalpost where success is always defined after the fact.
One way forward is to rethink what we are trying to capture when we talk about intelligence in the first place, especially in an engineering or design context. Drawing on the work of philosopher Mark Okrent, intelligence can be understood not as a single ability, but as a pattern involving three interrelated features: goals, efficiency, and robustness.
On this view, systems can fail or succeed to be more or less intelligent in different ways, and these failures need not all occur at the same level of analysis. A system may fail at the computational level if its goals are poorly ordered, internally conflicted, or otherwise incoherent. A familiar fictional example is HAL 9000 in 2001: A Space Odyssey. HAL’s behavior reflects not random malfunction, but a failure of goal coherence: there is no consistent account of what success requires given the objectives it has been assigned.
Other failures arise at the algorithmic level. Early chess programs, for example, had an appropriate and well-defined goal (playing winning chess) but relied on procedures that treated success as requiring exhaustive search through possible future positions. While correct in principle, this strategy quickly became computationally infeasible in practice, because the space of possible moves grows explosively. Importantly, this concern is not confined to early AI. As Neil C. Thompson, Shuning Ge, and Gabriel F. Manso show, across domains such as weather prediction, protein folding, geological resource discovery, and games like chess and Go, exponential increases in computing power have typically produced only linear improvements in performance. Their analysis suggests that simply supplying more computational resources to existing algorithmic architectures often yields diminishing returns when the underlying algorithmic framing of the problem remains unchanged.
The third kind of failure involves robustness, the ability of a system not just to pursue a goal but to adapt flexibly when circumstances change in unexpected ways. Philosophers and AI researchers have long seen this as a central challenge for intelligence, though Mark Okrent was the first to incorporate it into an account of intelligence as such.
One famous illustration of this difficulty, called the frame problem, comes from Daniel Dennett’s “cognitive wheels” thought experiments. Dennett describes simple robots designed to fetch a battery from a room. The first, R1, succeeds in retrieving the battery by pulling it in a wagon, but fails to notice that the wagon brings a bomb along with it. Its successor, R2, designed to consider such possible consequences, becomes so overwhelmed with calculating irrelevant side effects (whether the wall color will change, how many wheel revolutions there will be, and so on) that it never gets to the battery in time. These robots are non-robust because they cannot decide which aspects of the situation are relevant to their goals without engaging in endless, irrelevant computation.
The frame problem, then, is a central difficulty for any account of intelligence that aims to explain flexible, goal-directed behavior across changing environments. An intelligent agent must do more than apply a fixed strategy; it must constantly and efficiently decide what matters in a situation and what can be ignored. Current AI systems often lack this capacity. They may pursue goals efficiently within narrowly defined contexts, but when those contexts shift in subtle ways they can fail spectacularly. They lack the ability to recognize which differences require them to adapt their behavior and which do not.
Seen this way, the limits of current AI systems are not best described as a simple failure to achieve general intelligence. Instead, they reflect specific ways in which systems fall short along dimensions of goal coherence, efficiency, and robustness. This way of understanding intelligence fits naturally with Marr’s levels of analysis and helps explain why intelligence cannot be captured by any single test or benchmark. It also clarifies why discussions of artificial intelligence must attend not only to what systems can do, but to how flexibly and appropriately they can continue to achieve goals when conditions change.
Coda: Coda: “The Logical Song” — Supertramp (for obvious reasons).
Further Reading:
Becker, Adam. 2025. More Everything Forever: AI Overlords, Space Empires, and Silicon Valley’s Crusade to Control the Fate of Humanity. Basic Books.
Boden, Margaret A., ed. 1990. The Philosophy of Artificial Intelligence. Oxford University Press.
Dennett, Daniel C. 1984. “Cognitive Wheels: The Frame Problem of AI.” In Minds, Machines and Evolution, ed. by Christopher Hookway, 129–150. Cambridge: Cambridge University Press.
Gebru, Timnit, and Émile P. Torres. “The TESCREAL Bundle: Eugenics and the Promise of Utopia through Artificial General Intelligence.” First Monday 29, no. 4 (April 14, 2024).
Marr, David. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco: W. H. Freeman.
Newell, Allen, and Herbert A. Simon. 1976. “Computer Science as Empirical Inquiry: Symbols and Search.” Communications of the ACM 19 (3): 113–126.
Okrent, Mark. 2007. Rational Animals: The Teleological Roots of Intentionality. Athens: Ohio University Press.
Neil C. Thompson, Shuning Ge, and Gabriel F. Manso (2022), “The Importance of (Exponentially More) Computing Power.”
Turing, Alan M. 1950. “Computing Machinery and Intelligence.” Mind 59 (236): 433–460.


