Linguists find proof of sweeping language pattern once deemed a 'hoax'

2025-05-196:23132138www.scientificamerican.com

Inuit languages really do have many words for snow, linguists found—and other languages have conceptual specialties, too, potentially revealing what a culture values

In 1884 the anthropologist Franz Boas returned from Baffin Island with a discovery that would kick off decades of linguistic wrangling: by his count, the local Inuit language had four words for snow, suggesting a link between language and physical environment. A great game of telephone inflated the number until, in 1984, the New York Times published an editorial claiming the Inuit have “100 synonyms” for the frozen white stuff we lump under a single term.

Boas’s observation had swelled to mythic proportions. In a 1991 essay, British linguist Geoff Pullum called these claims a “hoax,” citing the work of linguist Laura Martin, who tracked the misinformation’s evolution. He likened it to the xenomorph from Alien, a creature that “seemed to spring up everywhere once it got loose on the spaceship, and was very difficult to kill.” His acerbic critique rendered the subject taboo for a generation, says Victor Mair, an expert on Chinese language at the University of Pennsylvania. But now, he says, “it’s coming back in a legitimate way.”

In a sweeping new computational analysis of world languages, researchers not only confirmed the emphasis on snow in the Inuit language Inuktitut but also uncovered many similar patterns: what snow is to the Inuit, lava is to Samoans and oatmeal to Scots. The results were published in the Proceedings of the National Academy of Sciences USA in April. Charles Kemp, a computational psychologist at the University of Melbourne in Australia and senior author of the study, says the results offer a window onto language speakers’ culture. “It’s a way to get a sense of the ‘chief interests of a people’—what’s important to a society, what they prioritize and value,” he says, quoting Boas.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The researchers analyzed bilingual dictionaries between English and more than 600 languages, looking for what they call “lexical elaboration,” in which a language has many words related to a core concept. It’s the same phenomenon that fueled the Inuit debate. But this study brings a twist: rather than the number of words, it measured their proportion, the slice of dictionary real estate taken up by a concept. This produced elaboration scores for hundreds of concepts, from “abandonment” to “zoo,” based on how many times the English words for those concepts appeared in the definitions of foreign words. You can explore the results in this online module that shows which languages have the most words for each concept and which concepts have the most words in each language.

Three maps show the languages with the top scores for the concepts of smell, dance and snow, respectively, based on the proportion of dictionary entries that mention the concept. They also show the top terms that share a similar distribution for each concept.

Often the elaboration is clearly a product of environment—small wonder that Arabic, Farsi and Indigenous Australian languages abound with words to describe the desert, and Sanskrit, Tamil and Thai with words for elephants. Other cases aren’t so straightforward. Many Oceanic languages, for example, have highly specific words for smell. In Marshallese, meļļā means “smell of blood” and jatbo means “smell of damp clothing.” This may be explained by the humidity of the rainforest, which amplifies scents. But why is the concept of rapture so prominent in Portuguese and agony in Hindi? What historical and cultural circumstances lead a language down such obscure paths? “I’m not sure if anybody knows,” Kemp says.

Mair says this research, which he highlighted on the popular linguistics blog Language Log, helps resurrect the much-maligned idea of linguistic relativity, sometimes known as the Sapir-Whorf hypothesis. At its boldest, linguistic relativity asserts that language determines how we perceive things, causing speakers of different languages to experience the world in radically different ways (think of the movie Arrival, in which a character becomes clairvoyant after learning an alien language). But in Mair’s opinion, this study supports a softer claim: our brains all share the same basic machinery for perceiving the world, which language can subtly affect but not restrict. “It doesn’t determine,” he says. “It influences.”

Similarly, Lynne Murphy, a linguist at the University of Sussex in England, who was not involved in this study, notes that “any language should be able to talk about anything.” We may not have the Marshallese word jatbo, but four words of English do the trick—“smell of damp clothing.” It’s not that having many precise words for smell reveals mind-blowing cognitive abilities for processing smell; it’s simply that single words are more efficient than phrases, so they tend to represent common subjects of discussion, highlighting areas of cultural significance. If we routinely needed to talk about the smell of damp clothing, we’d whittle that unwieldy phrase down to something like jatbo.

Still, “lexical elaboration alone cannot tell us about the culture of its speakers,” at least not with certainty, says study lead author Temuulen Khishigsuren, a Ph.D. candidate at the University of Melbourne. And because this analysis was based on dictionaries, it comes with the biases and limitations of the lexicographers that wrote them. As Murphy puts it, they “offer only snapshots of a language at a particular time, from a particular angle.” Some of the dictionaries used are decades or centuries old, and they may reflect the archaic concerns of colonizers—to translate the Bible or establish a trade route—as much as those of modern-day speakers. Dictionaries of vast written languages like German or Sanskrit are much larger than those for languages that are exclusively spoken and are loaded with esoteric terminology.

Because dictionaries don’t represent how people use language in the real world, the next step would be to measure how often people actually talk or write about the concepts being studied, such as snow and smells and elephants. This is difficult for languages without large bodies of written text but could be possible for many languages, especially those used heavily on social media.

It bears remembering that these lexical elaborations come from comparison between languages—French only has “many” words for futility because other languages have fewer. And because all the bilingual dictionaries in this study map back to English—it’s the language into which everything else gets translated—the analysis is influenced by the words used in English itself. If we find the patterns of elaboration in other languages unusual, it’s safe to assume their speakers will return the favor. “English is as ‘different’ as any other language,” Murphy says, which raises the question: “If we had started from, say, Spanish or Chinese or Malayalam, which concepts would have stood out for English?”


Read the original article

Comments

  • By earthicus 2025-05-2014:201 reply

    "The researchers analyzed bilingual dictionaries between English and more than 600 languages, looking for what they call “lexical elaboration,” in which a language has many words related to a core concept. It’s the same phenomenon that fueled the Inuit debate. But this study brings a twist: rather than the number of words, it measured their proportion, the slice of dictionary real estate taken up by a concept."

    This seems inadequate to make the kinds of claims the researchers are quoted as asserting in the article.

    • By mzs 2025-05-2015:151 reply

      Indeed, I looked at some highly scored words for Polish in google translate and they are words where the foreign word, transliterations into Polish, and Polish word are used. And when you pare it down to say five real distinctive meanings, you often find similar less commonly used synonyms in English. Also as I was looking through it seemed that possibly it was not taking into consideration verb vs. noun in English cause the counts seemed oddly way off for some where it could have happened. If you are familiar with English and another language, I would like to know what you see.

      • By yorwba 2025-05-217:08

        Yeah, lots of fun data issues can be found in their exploration tool https://charleskemp.com/code/lexicalelaboration.html

        Icelandic has a bunch of dictionary abbrevations: medic(al), temp(us), germ(anic), veg(etation). Tarifit is dominated by linguistic terminology. German has a few German words that look like English words meaning something completely different (mantel, tier, boot, stall), one loanword (angst) and what might be dictionary abbrevations again: humor(ous), miner(alogy), spa(nish)...

  • By KingOfCoders 2025-05-217:311 reply

    "for the frozen white stuff we lump under a single term."

    From my perspective this is the hoax. I come from the alps and we have dozens of terms for snow. Only those people without snow might have one word, because they have no need to describe different versions of snow. I remember Sulz, Firn, Neu, Kunst, Matsch, Harsch, Papp, Pulver, ... (left 35 years ago).

    • By freilanzer 2025-05-219:293 reply

      This is still Pappschnee, Neuschnee, Pulverschnee, etc.

      • By bjourne 2025-05-219:391 reply

        Blötsnö, nysnö, pulversnö. I can make up new ones too: lastbilssnö (truck snow). With agglutenative languages counting words doesn't work.

        • By decimalenough 2025-05-2110:051 reply

          This is in fact thought to be a large reason for the original "Eskimo words for snow" claim: Inuit languages are extremely agglutinative.

          • By ranadomo 2025-05-2115:091 reply

            I believe this is the case and the wiki summary seems to agree.

            > Geoffrey K. Pullum's explanation in Language Log: The list of snow-referring roots to stick [suffixes] on isn't that long [in the Eskimoan language group]: qani- for a snowflake, apu- for snow considered as stuff lying on the ground and covering things up, a root meaning "slush", a root meaning "blizzard", a root meaning "drift", and a few others -- very roughly the same number of roots as in English. Nonetheless, the number of distinct words you can derive from them is not 50, or 150, or 1500, or a million, but simply unbounded. Only stamina sets a limit.

            https://en.wikipedia.org/wiki/Eskimo_words_for_snow#cite_not...

            The Lexical Elaboration Explorer app does not allow one to see the actual words for snow for any language, so the tool is mostly a geographic and word-density plotter, but neither the article nor the website add much nuance to this debate. The hypothesis is fairly obvious: languages have words for common things. It's not really falsifiable and I find this type of analysis typical of modern research. Sloppy, surface-level, coding-tutorial demonstrations of mostly useless data display.

            • By bunderbunder 2025-05-2121:25

              It kind of goes in the other direction, too. Can you say that Chinese doesn't have a word for "because" because 因为 is actually a compound of 因 "in accordance with" and 为 "the purpose of"? Does English not have a word for "ratel" because instead they use two words: "honey badger"? Does that imply they're more important to French culture than to English culture? Is Haiti a transgender paradise because Kreyól lacks gendered pronouns so clearly gender isn't an important concept in Haitian culture?

              I'm not going to say that language doesn't say anything about culture in general. But I do think that most specific analyses chasing after this idea are doomed to say more about the analyst than they do about the analyzed.

      • By KingOfCoders 2025-05-219:511 reply

        Yes and no.

        No because, Firn and Harsch are words on their own.

        Yes, because of the way the German language works. It tends to create new words by combining old words not by creating new short words (Dialects like Bavarian work differently though, they often tend to create new words).

        Then after centuries people forget that and think it's one word. Like "Enttäuschung" (disappointment) which people no longer realize what the two words are and that "Enttäuschung" really means that you had been deceived ("Täuschung") and now are not longer - the deeper meaning of "Enttäuschung" in German. Same for "Werkzeug" (Tool) - the words get their own identity.

        What I found most interesting was Rücksicht, Vorsicht, Nachsicht, Einsicht, Weitsicht (and more) where probably no German would think they are the same word, "Sicht" combined with another one. All of those words have their own, distinctive identity.

        • By piombisallow 2025-05-2115:002 reply

          Why would they not think it's from the same word? Foresight, hindsight obviously come from 'sight'.

          • By noworriesnate 2025-05-2116:581 reply

            Beware = be + ware, but most people don’t use be- as a prefix anymore (“I’m going to bequiz my students this morning”) and they don’t use ware to mean pay attention anymore. The word “sight” Is still in common usage though.

            • By bradknowles 2025-05-2518:53

              No, but there is still common use of the related form “wary”.

          • By KingOfCoders 2025-05-2115:091 reply

            It's like you don't think of a window when you hear Microsoft Windows, or an apple when you hear Apple.

HackerNews