The finding is one of the most carefully documented and least understood in contemporary cognitive science. For most of the twentieth century, average IQ scores in industrialised countries rose at a consistent pace. The increase, documented across multiple test instruments and across most of the developed world, was approximately three IQ points per decade between the 1930s and the 1990s. A person of average intelligence in 1980 would have scored substantially higher than a person of average intelligence in 1930 on the same test. The trend was so consistent that test designers had to periodically rewrite the questions to make them harder, in order to keep the average score at the standardised baseline of 100.

And then, in the mid-1990s, the trend reversed.

The reversal was first documented in Norway and Denmark, in the data from mandatory military conscription IQ testing that both countries had been administering to every young adult male for most of the twentieth century. The decline has since been confirmed in Finland, France, the Netherlands, Britain, Estonia, Australia, and, in slightly different form, the United States.

What James Flynn discovered

The rise was named after the New Zealand philosopher and political scientist James Flynn, who first documented it in a 1984 paper and developed it across the subsequent decades. Flynn, who worked at the University of Otago, was not himself an intelligence researcher when he started. He came to the topic from political philosophy, looking for evidence about whether the human capacity for abstract reasoning was distributed evenly across populations. What he found, by gathering historical IQ test data from countries that had standardised their scoring over multiple decades, was that scores had risen so dramatically that test designers had been continuously rewriting the tests without anyone noticing.

If you administered a 1932 IQ test to a sample of people today and a 1932 sample of the same age, the modern sample would score, on average, approximately 30 IQ points higher. A person of average intelligence today would, on the 1932 scale, score around 130, which on that scale would have placed them in the top 3 per cent of the population.

The rise was real. It was also faster than any plausible explanation could account for.

Flynn himself, in his 2012 book Are We Getting Smarter?, set out the standard interpretation. The IQ gains were not, he argued, evidence that human brains had become biologically more capable in three generations. They were evidence that humans had become more practised at the specific kinds of abstract reasoning that IQ tests measure. Modern life, with its symbol-heavy environments, its formal schooling, its expectations of categorical thinking and hypothetical reasoning, was training people from childhood in exactly the mental habits the tests reward. The tests had not measured raw intelligence. They had measured cultural exposure to a particular mode of thinking.

The same interpretation, Flynn noted before his death in 2020, could explain why the trend had now reversed.

On a lightly related note, we came across this video that explains why quirkiness and being “weird” may actually be a sign of intelligence:

When the trend reversed

The first reversal in the published literature was documented by Thomas Teasdale and David Owen in 2005, using Danish military conscription IQ data. The Danish scores, which had risen continuously between the 1950s and the mid-1990s, had plateaued in the late 1990s and then begun to fall. Subsequent analyses confirmed the same pattern in Norway, where the decline has continued for the past thirty years at a steady pace.

In 2018, the Norwegian economists Bernt Bratsberg and Ole Rogeberg published a study in the Proceedings of the National Academy of Sciences that did something most previous research on the Flynn Effect had not. They examined IQ data within families. The Norwegian conscription data is unusual in that it includes IQ scores for fathers and sons across multiple generations, all measured under the same conditions on the same test, with the same scoring system. If the decline were genetic, the team reasoned, the within-family pattern would show parents and children scoring at similar levels regardless of when they were tested. If the decline were environmental, the within-family pattern would show younger generations scoring lower than their fathers did at the same age, even within the same family. The data showed the latter, unambiguously.

The paper’s conclusion was direct. The Flynn Effect and its reversal are both environmentally caused. The decline cannot be explained by genetic drift, by dysgenic fertility, by selective immigration, or by any of the other prominent hypotheses that had been proposed in the years leading up to the study. Whatever is making younger Norwegians score lower on IQ tests than their fathers did at the same age, the cause is something in the shared environment of contemporary Norwegian life.

The American reversal

In March 2023, a research team at Northwestern University and the University of Oregon published the first comprehensive American study of the reversal. Elizabeth Dworak, William Revelle, and David Condon analysed 394,378 online IQ-test responses collected between 2006 and 2018 from American adults through the Synthetic Aperture Personality Assessment project. The study examined four broad domains of cognitive ability tested by the same instrument over the thirteen-year period.

Three of the four domains showed declines. Verbal reasoning scores fell by approximately 4.35 points. Matrix reasoning scores fell by approximately 2.85 points. Letter and number series scores fell by approximately 1.65 points.

The fourth domain, three-dimensional spatial rotation, rose.

The Dworak team did not interpret the rises and falls as direct evidence of changes in underlying intelligence. The lead author noted that test-takers seeking out an online personality survey may have been less motivated to engage carefully with the cognitive sections, and that the specific decline could partly reflect changes in test-taking behaviour rather than changes in cognitive ability. The team’s published interpretation is that something is changing in how Americans engage with the kinds of reasoning these tests measure, but exactly what that something is remains an open question.

The 3D rotation rise, which the team also reported but which has received less popular attention, may be the most analytically interesting finding. Three-dimensional spatial reasoning is the kind of cognitive task that contemporary digital environments, including video games, navigation apps, computer interfaces, and 3D rendering tools, exercise continuously. It is also the only major domain on which contemporary Americans appear to be improving.

Why nobody knows what is causing it

The peer-reviewed literature has not converged on a single explanation for the reverse Flynn Effect. Several hypotheses are currently under active investigation, each supported by some evidence and contested by other researchers.

The educational hypothesis argues that contemporary schooling has moved away from the rote memorisation, drilling, and explicit instruction in formal reasoning that built up the abstract cognitive habits the original Flynn Effect rewarded. The argument is plausible and is supported by some PISA-test data showing declines in reading comprehension and mathematical reasoning across many of the same countries showing the Flynn reversal. The argument has been contested by researchers who note that educational reform has not been uniform across the affected countries and the timing of the reform does not always match the timing of the decline.

The screen-time hypothesis argues that the shift from sustained deep reading to fragmented digital consumption has reduced the cognitive practice that IQ tests reward. The argument is plausible and is supported by some evidence linking heavy screen exposure to reduced attention span and reduced reading comprehension. The argument has been contested by researchers who note that the decline began before smartphones were widely available, and that countries with similar digital media penetration show different patterns of decline.

The environmental hypothesis, in the broader sense, argues that some combination of nutrition, pollution, sleep deprivation, declining social trust, and reduced civic engagement is producing a cumulative cognitive cost that the tests are picking up. The argument is plausible and is supported by some evidence on each individual factor. It has been contested as too unspecific to be testable.

The test-itself hypothesis argues that the modern IQ tests, designed in the early twentieth century, are increasingly poorly calibrated to the cognitive abilities contemporary humans actually develop and exercise. The argument is plausible and is supported by the observation that the one domain showing improvement (3D rotation) is also the one most closely matched to contemporary digital skills. It has been contested as a partial explanation that does not account for the absolute magnitude of the decline in the other domains.

None of these hypotheses is conclusively supported. None has been ruled out. The Bratsberg and Rogeberg finding that the cause is environmental rather than genetic narrows the search but does not specify which environmental factors matter most.

What it does and does not mean

Several things are worth saying clearly about what the reverse Flynn Effect is and is not.

It is not evidence that contemporary humans are biologically less intelligent than their parents. The genetic basis of intelligence has not changed in three decades. Evolution does not operate at that speed.

It is also not evidence that contemporary humans are, in some broader sense, less capable than their parents. IQ tests measure specific cognitive abilities under specific testing conditions. They do not measure creativity, social intelligence, emotional regulation, practical problem-solving, motivation, or many of the other cognitive capacities that humans use in daily life. Improvements or declines in IQ scores tell us something about the specific abilities the tests are calibrated to detect, not about the full range of human cognitive functioning.

It is, however, evidence that something in the shared environment of contemporary developed-world life is changing how humans practise and develop the kinds of abstract reasoning that built up across the twentieth century. The 3D rotation finding suggests the change is not a uniform cognitive decline but a redistribution of cognitive practice toward different abilities than those the standard IQ instruments were designed to detect. Whether the redistribution represents a net gain, a net loss, or a sideways shift in cognitive functioning is the question the next decade of research will need to answer.

What is at stake

The Flynn Effect was, for most of the twentieth century, one of the strongest pieces of evidence that human cognitive capacity was rising in line with material conditions. Better nutrition, longer schooling, more abstract work, and more complex environments produced people who could think in more sophisticated ways than their grandparents could. The reversal, if it continues, undermines that confidence.

It also raises a question that the peer-reviewed literature has not yet been able to answer. If something in contemporary life is making people, on average, less skilled at the specific kinds of reasoning the standard tests measure, what is the long-term cost of that change to the societies in which it is happening? The skills the tests measure are the same skills that underpin most contemporary intellectual work, including the law, the sciences, journalism, education, medicine, and policy-making. A society in which the average citizen scores five to seven points lower on those skills than the previous generation did is, by the strongest available evidence, a society with measurably less cognitive capacity across all of those domains.

The trajectory has not reversed. The decline has, by the most recent published data, continued in every country in which it has been measured.

What it would take to reverse the reversal is currently unknown.