Stop me if you’ve heard this one: “The great thing about computers is that they do exactly what you tell them to do. The terrible thing about computers is that they do exactly what you tell them to do.” But what, exactly, are we telling our computers to do?
In a recent paper (Kalai et. al. 2025), researchers at OpenAI and Georgia Tech offer a new explanation of why large language models (LLMs, such as ChatGPT) hallucinate. The term ‘hallucinate’ is unfortunate, since it suggests that LLMs have quasi-cognitive or perceptual processes that occasionally malfunction, or “make stuff up.” But this is entirely the wrong way to think about how LLMs work. It’s not that LLMs are searching for or reasoning about information, and sometimes “make stuff up.” It’s much more accurate to say that LLMs always make stuff up, but most of the time what they make up happens to be true. Incidentally, the fact that LLMs are right as often as they are, given how they work, is remarkable, and should be celebrated as the significant technical and scientific accomplishment that it is.
Let’s dive into this a little more, beginning with the term ‘hallucination.’ This term refers to models’ propensity to produce results that are plausible and delivered in a way that seems confident but are nevertheless incorrect. In light of this, it would be more helpful to replace the concept of hallucination with a different philosophical concept: Bullshit (Hicks et. al. 2024). This term received a celebrated philosophical analysis from Harry Frankfurt in his 1986 essay, “On Bullshit.” On Frankfurt’s analysis, one bullshits when one is “engaged in an activity to which the distinction between what is true and what is false is crucial, and yet [one takes] no interest in whether what [one] says is true or false.” LLMs do not take an interest in anything, hence, they do not take an interest in whether their output is true or false. Thus, all of their output is bullshit, in Frankfurt’s sense, even if most of it is in fact true.
At this point, I suppose the following objection will have occurred to you: I grant that LLMs don’t have an interest in truth or falsity, but surely the people who make them do. Models that don’t reliably return true outputs wouldn’t be useful, and therefore would not constitute the multi-trillion dollar industry that they do. This is a good point, but it contains a subtle error. To uncover the error, we must dive just a little into the details of how LLMs work.
Here’s the short, minimally technical explanation. Training for LLMs (and many other kinds of generative AI models, for that matter) is typically divided into two phases. In the first phase, called “pre-training,” the model is given a large corpus of training data (text, in the case of LLMs), performs statistical analysis on that data to find regularities and then generates new output data (e.g. text) that fits the patterns detected in the training data. It may be helpful to note that the “GPT” in “ChatGPT” stands for “Generative Pre-trained Transformer.” “Pre-trained” means that it is given a large body of training data, a “Transformer” is a software architecture that finds patterns in the relationships between words (and parts of words) in the training data and “Generative” means that it outputs novel instances of the patterns that it detected.
The second phase of training for LLMs is called “post-training.” In this phase, the model is refined by grading its outputs. You have probably taken part in the post-training phase for LLMs that you use. Occasionally, when you use a LLM, you may be given two outputs for a given prompt and asked to choose which one you prefer. In making that choice, you are providing information that contributes to refining the model’s output; over time, it will tend to produce outputs that are more similar to the output you preferred.
What Kalai et. al. discovered is that LLMs’ propensity to bullshit comes largely from the post-training phase. Let’s think about why this is. LLMs have a propensity to produce output that sounds plausible, in a confident tone, that is nevertheless completely made up (remember, “G” is for “Generative”). Why do they do this? The answer is as sad as it is simple: Because we tell them to. Remember how post-training works. Evaluators grade model outputs, and these grades inform future model behavior. When we are given a choice of outputs, we tend to prefer clear, simple, confident ones. Those are the ones that strike us as valuable or useful (not to mention that we also tend to prefer outputs that flatter us and are consistent with what we already think).
Why do we do this? Why do we prioritize confidence and clarity over truth? To be honest, I don’t think anyone has a fully satisfying answer to this question, but I do have some thoughts. First of all, I think it’s important to note that this problem is not unique to our interaction with LLMs. In our interactions with other people, we tend to lionize confidence; we rely on the fact that it sounds like a person knows what they’re talking about as a proxy for determining that they actually do know. If your reaction to this point is “well, maybe most people do that, but I don’t!” you are, like the 93% of sampled U.S. drivers who rated the safety and skillfulness of their driving as “above average” (Svenson 1981), probably experiencing the effects of the cognitive bias known as “illusory superiority.”
The next time you are trying to learn about something that requires you to learn from someone else, ask yourself: What am I relying on to judge that this person is a reliable informant? Be honest with yourself. Is it that what they are saying “makes sense,” seems plausible or “passes the smell test?” If so, that ultimately just means that it coheres with what you already think. Is it that they have advanced degrees and peer-reviewed publications? That’s not a bad proxy, but it has significant limitations: People with degrees are known to make pronouncements that reach beyond their areas of expertise, and conversely many people without degrees are genuine experts (consider the mechanic who works on your car). Or, perhaps, are you just relying on the fact that they are articulate and sound confident; that they sound like they know what they’re talking about? That’s a human habit that’s probably as old as our habit of linguistic communication, but LLMs have thrown it into sharp relief. We use articulate text and a confident tone as a proxy for clear and accurate thought. But LLMs have shown us what we should have known all along: you can generate the text without any of the thought.
It is a beautiful and tragic fact that we humans are social creatures through and through. We have no choice but to rely on each other to learn about our shared world. This fact is beautiful because it lays bare our epistemic dependence on and indebtedness to one another. But it is tragic because it makes reliable informants into a scarce and valuable resource, and this produces significant incentives to pretend to be one. Put another way, reliability is highly valuable but difficult to detect directly. Consequently, we seek out and reward its more visible proxy: confidence.
What should we do about this? First, we must recognize that LLMs are not specimens of some new alien intelligence. Rather, they are funhouse mirrors in which we can see our own distorted reflections. LLMs bullshit with unwarranted confidence because they were trained on us. They reflect our own tragic habit of rewarding people who sound certain over people who actually know what they’re talking about. So how can we change the reflection in the mirror? My modest proposal is epistemic humility. We should stop prioritizing the loudest, most confident voices in the room and in our own heads. LLMs will always be able to generate confident-sounding text faster than we can. But what they can never do is pause, carefully observe the nuances of a complex situation, and say, “I don’t know, but let’s figure it out.” That kind of careful, humble judgment takes practice. Fortunately, it’s exactly the sort of practice that Wabash provides ample opportunities for you to engage in. I hope you take advantage of them.
