If we take a sufficiently long text—for example, a novel, a collection of articles, a large linguistic corpus, and so on—and count how many times the various words appear, we discover a very precise regularity. The most frequent word appears a great many times. The second appears a little less often. The third even less often. And so on. But the interesting thing is that this decrease is not random. If we order the words by frequency—that is, if we assign rank 1 to the most frequent word, rank 2 to the second, rank 3 to the third, and so on—then the frequency of a word is approximately inversely proportional to its rank.
In other words: the word with rank 2 appears about half as often as the word with rank 1, the word with rank 3 about one-third as often as the first, the word with rank 4 about one-fourth as often, and so on.
This is Zipf’s law.
Today we call it that because George Kingsley Zipf studied it and made it famous, especially in the context of natural language. But before Zipf, this regularity had already been observed by Jean-Baptiste Estoup, a French writer and stenographer, and later it was rediscovered by others as well. Zipf, however, made it one of the central points of his reflections on language, also associating it with what he called a kind of “law of least effort”: the idea that language tends to organize itself in such a way as to balance the effort of the speaker and the effort of the listener.
Zipf’s law is not an isolated case. It is part of a much broader family of statistical regularities, the so-called power laws. We find them in many complex systems: in the sizes of cities, in the distribution of wealth, in networks, in the popularity of websites, in scientific citations, and, of course, in language.
Now, in the case of language, this is particularly interesting because it tells us something profound. When we write or speak, we have the impression that we are free to choose our words. And in a certain sense, we are. But this freedom is not complete. If we want to produce a text that is understandable, coherent, and meaningful, we cannot use words in a completely arbitrary way. We are constrained by the relations of meaning among words.
If I begin to talk about quantum physics, certain words suddenly become much more probable: “state,” “measurement,” “system,” “energy,” “observable,” “probability.” If, instead, I am talking about cooking, the field of probable words changes completely. Not only that: every word I introduce into a text modifies the context, and the context in turn modifies the probabilities of the words that follow. It is as though the text, as it grows, were building a semantic landscape. Some regions of this landscape become more active, more attractive, and tend to call forth other words connected to those meanings.
And this is where the so-called Yule-Simon process comes into play.
It is a very simple model, much simpler than the way natural language is actually generated, but it captures a fundamental idea: what has already been used tends to be used again, in proportion to how much it has been used up to that point. That is, the more a word is present in a text, the more likely it becomes that it will be reused as the writing of the text continues.
This is sometimes called the “Matthew effect,” because in the Gospel of Matthew there is a verse that says: “to everyone who has, more will be given, and he will have an abundance; but from the one who has not, even what he has will be taken away.” In short, the rich become richer and the poor become poorer. Capitalism, as we know it, unfortunately works precisely in this way, as the French economist Thomas Piketty has shown, famous for his studies on economic inequality.
But to return to Herbert Alexander Simon, himself an economist: in 1955 he used this type of model to explain the power-law distributions that appear in natural-language texts, connecting his work to earlier studies by the British statistician Udny Yule.
Of course, in human language we do not choose a word only because it has already appeared before. But something similar happens at the semantic level: the context already constructed makes certain words more natural, more available, and more coherent with what we are saying. In this sense, the Yule-Simon process can be seen as a mathematical caricature—very simplified, but very apt—of a more articulated and complex phenomenon: the contextual updating of a text. The text grows, the context updates itself, and the following words are not chosen in a vacuum, but within an already active field of meanings.
There is also a fascinating connection between Zipf’s law and Bose-Einstein quantum statistics. This connection was first highlighted in 1974 by the statistician Bruce Marvin Hill. However, simply identifying this formal correspondence did not amount to an “explanation” of Zipf’s law; if anything, it shifted the mystery. It was only with the independent rediscovery of this correspondence by Diederik Aerts and Lester Beltran in 2020, within the framework of quantum cognition and the conceptuality interpretation, that the possible deeper meaning of Zipf’s law emerged: as the trace of an underlying quantum structure, brought forth through a process—that of writing—regulated at the level of meaning.
In other words, the idea is that we can look at a text as a system in which words are equivalent to quantum entities occupying different states, somewhat as particles in physics occupy different energy levels. The rank of a word then plays a role similar to that of the energy level, and quantum entanglement is the equivalent of a connection of meaning.
P.S.: This article is based on a video of mine published on YouTube: https://youtu.be/CAKX4rnOSpQ
