Reversing the English-only trend in science

Heart of Gold, a mandala by Jay Mohler
“Heart of Gold,” a colorful God’s Eye by Jay Mohler. Jay sells his sculptures on Etsy.

Often we think of science as uncovering a God’s-eye-view of the universe—dare I use the word objective? Sure, this may be the ultimate goal of some branches of science, but even in these cases, the road to God’s Eye is anything but monochromatic.

Our language colors the way we think. Words and phrases may reveal connections that would be invisible to speakers of other languages. (For a riveting exploration of human analogy-making, check out Doug Hofstadter and Emmanuel Sander’s 2013 book Surfaces and Essences.)

In science, then, scholars who speak and write in different languages may take vastly different approaches to solving problems. They may identify different problems, to begin with, but even in exploring the same problems as scholars in other languages, they may proceed differently. This is another reason why I am a proponent of linguistic diversity: These different approaches serve to enrich the human scientific enterprise.

A recent BBC article by Matt Pickles brings attention to the trend toward English-dominance in science, and academia in general. Higher education is becoming ever more Anglophone, as is scientific communication. We write in language, of course, and the way we write also interfaces with the way we think. From this series of perhaps-obvious observations, we can appreciate that language, writing and thought are intertwined. Because science advances through writing, the linguistic white-washing of scientific communication also serves to white-wash science itself. For instance, because international journals are unlikely to accept non-English quotations, authors who want to publish in these journals (often used as a measure of their success as researchers) may be coerced into subscribing to Anglophone theories and methods, as “nonstandard” approaches may not be deemed publishable.

The move toward all-English has an interesting historical parallel, drawn out in the article linked above. Centuries ago, science was written in Latin. A German campaign for scientific linguistic diversity reminds us that Galileo, Newton and Lagrange abandoned Latin in order to write in their vernacular. (We see the same in the literary world: Dante, for instance.) Professor Ralph Mocikat, a German molecular immunologist who chairs this campaign, says that the vernacular “is science’s prime resource, and the reintroduction of a linguistic monoculture will throw global science back to the dark ages.”

What can be done to foster linguistic diversity in science? Because of all the machinery involved, it will surely be a slow process. But it has to start somewhere. Here are a few ideas that come to mind:

  • For academic institutions:
    • Require second-language proficiency in all PhD students.
    • Find ways to facilitate searching the literature in other languages.
  • For journals:
    • Allow space for translations of papers, perhaps one article per issue, or perhaps in an annual special issue of translations.
    • Publish abstracts in multiple languages, even if the content itself is only in one language.
    • Provide translation services to facilitate access of academic work in other languages.
    • Broaden your base of peer reviewers to include researchers with other native languages.
  • For researchers:
    • Participate in international conferences, particularly smaller ones. Talk to researchers in your field whose native language is not English.
    • If you don’t speak another language, start learning one. It’s easier than you think. If you do, search the literature in that language the next time you write a paper.

What else?

Update: This post spawned an interesting conversation on Facebook with a few of my friends. When assessing this trend, we should also consider the needs and values of specific fields. Though I stand by the above discussion for the kind of research I do (humanities and “soft” sciences), a linguistic monoculture could indeed be valuable for certain work in the natural sciences. Clarity means safety, as a friend who works with dangerous chemicals said. Moreover, using one standardized term for a phenomenon rather than a panoply of regionalisms has benefits, such as making a literature search easier.

Thanks to Dr. Deborah Turner for bringing the BBC article to my attention.

Word-compounding strategies as visual metaphor

I’ve written before about visual metaphor in writing—the idea that ALL CAPS should be interpreted as shouting (i.e., big letters imply big voice)—and here is another example.

When we create compound words in English, we have three strategies to write them. Compound words can be written with a space between the two words, such as in “real estate”; they can be written with a hyphen, such as in “well-being”; and they can be written together, such as in “doorbell.” These are referred to as open, hyphenated and closed, respectively.

Jelly + Fish = Jellyfish

How do we decide which strategy we use in compounding a word? At first, it seems random. After all, we can find the same word compounded in multiple ways, depending on the writer. Take “well-being.” Google returns over 13 million results for “wellbeing,” over 2 billion results for “well-being” and over 22 million results for “well being.” Granted, the hyphenated version is most common, but the other versions are still quite pervasive.

In some cases, it’s more clear-cut. For example, there’s a need to differentiate “black bird” from “blackbird” and “black board” from “blackboard.” Granted, these examples have different stress patterns and are arguably quite different; words like “black board” are two non-compound words (an adjective plus a noun), whereas “blackboard” is a single, compound word.

It seems to me that the idea of visual metaphor may be behind our word-compounding strategy choices. I propose that words (at least nouns) written with a space are seen as most disconnected (by which I mean that the two compounding elements are perceived as having little to do with each other) or novel, whereas words written together as seen as most cohesive, and words written with a hyphen are somewhere in between. Things may be written as two words when they are new concepts, then make their way to hyphenation and finally one-wordedness as they become more widespread.

I say this because I don’t believe there are any single, compound words whose meaning would change if you compound them using a different strategy. (noting, as i mentioned previously, that some compound words actually become two separate words when written with a space; these cases wouldn’t figure into this scenario.)

Consider the evolution of the name “electronic mail,” which was used at the technology’s infancy, to “e-mail,” which was used until very recently, finally giving way to “email,” which seems to be today’s preference. (The truncation of the word “electronic” further demonstrates the technology’s growing pervasiveness.) And how about the word “net-work,” which was hyphenated in the 1800’s, an act that would be unthinkable today?

This is only a first musing on the concept, and there is, of course, much more analysis and consideration to be done to form a fuller picture.

Using only orthographic features to identify a language

As I talked about in an earlier post, the Latin alphabet is now used to represent more languages than any other script system. But the Latin alphabet doesn’t have a character for every sound in the world; it was created to represent Latin, and it’s been adapted somewhat since then, but there are still plenty of natural language sounds out there that have no corresponding Latin character. In order to represent these sounds in the Latin alphabet without creating a new letter—an expensive proposition—we’ve relied on diacritical marks, as seen in ù, ú, û, ũ, ū, ŭ, ü, ủ, ů, ű, ǔ, ȕ, ȗ, ư, ụ, ṳ, ų, ṷ, ṵ, ṹ, ṻ, ǖ, ǜ, ǘ, ǖ, ǚ, ừ, ứ, ữ, ử, ự, and ʉ, as well as combinations of letters, such as ch.

But sometimes diacritics and multigraphs aren’t used to represent new sounds. Sometimes they’re used for ideological reasons, to create the illusion that a language is more unique than it actually may be. For example, when planning the Basque orthography, the language moguls decided to use tx to represent the same sound that ch represents in Spanish. Why create a new digraph when an existing one would serve perfectly well? One reason, surely, is because the invented digraph would emphasize that Basque is not Spanish—that the Basques are not Spanish!

There’s an interesting result of all this: Because of the unique characters and character combinations of many languages, it’s fairly easy to tell what language something is written in, even if you don’t know that language. If you see an ü and a sch in some text, it’s probably German, and if there was a ß, that’d give it away for sure.

I hypothesized that this would even work with made-up words.

This idea was the basis for an art project of mine, in which I considered the most defining characteristics of several languages: average number of letters per word, most common beginning and ending letters, most frequent letters, and unique or stereotypical multigraphs, characters or punctuation marks. As a result, I’ve created, among others, the most English fake English word.


The resulting words are interesting enough, but instead of simply printing them here like any other text, I traced their typeset forms and filled them in with watercolor, arousing meditations on mechanical reproduction, handwriting and creation.

See if you can tell which languages the rest of these made-up words are written in.