Category Archives: Spelling

Using only orthographic features to identify a language

As I talked about in an earlier post, the Latin alphabet is now used to represent more languages than any other script system. But the Latin alphabet doesn’t have a character for every sound in the world; it was created to represent Latin, and it’s been adapted somewhat since then, but there are still plenty of natural language sounds out there that have no corresponding Latin character. In order to represent these sounds in the Latin alphabet without creating a new letter—an expensive proposition—we’ve relied on diacritical marks, as seen in ù, ú, û, ũ, ū, ŭ, ü, ủ, ů, ű, ǔ, ȕ, ȗ, ư, ụ, ṳ, ų, ṷ, ṵ, ṹ, ṻ, ǖ, ǜ, ǘ, ǖ, ǚ, ừ, ứ, ữ, ử, ự, and ʉ, as well as combinations of letters, such as ch.

But sometimes diacritics and multigraphs aren’t used to represent new sounds. Sometimes they’re used for ideological reasons, to create the illusion that a language is more unique than it actually may be. For example, when planning the Basque orthography, the language moguls decided to use tx to represent the same sound that ch represents in Spanish. Why create a new digraph when an existing one would serve perfectly well? One reason, surely, is because the invented digraph would emphasize that Basque is not Spanish—that the Basques are not Spanish!

There’s an interesting result of all this: Because of the unique characters and character combinations of many languages, it’s fairly easy to tell what language something is written in, even if you don’t know that language. If you see an ü and a sch in some text, it’s probably German, and if there was a ß, that’d give it away for sure.

I hypothesized that this would even work with made-up words.

This idea was the basis for an art project of mine, in which I considered the most defining characteristics of several languages: average number of letters per word, most common beginning and ending letters, most frequent letters, and unique or stereotypical multigraphs, characters or punctuation marks. As a result, I’ve created, among others, the most English fake English word.

Theanster

The resulting words are interesting enough, but instead of simply printing them here like any other text, I traced their typeset forms and filled them in with watercolor, arousing meditations on mechanical reproduction, handwriting and creation.

See if you can tell which languages the rest of these made-up words are written in.

Sochadeño

Durßischtoung

Txugizeaken

S'leignôtré

Lletosà

Drziłyświczą

Shoot, what’s my last name again?

We’ve talked before about the relationship between writing and speech, and today I want to discuss one contemporary example in particular: my last name.

Is my last name—or any last name—its spelling, or its pronunciation?

The spelling Gorichanaz has stayed stable over the past few generations, although it was originally spelled Goričanec. The pronunciation, on the other hand, has changed plenty. I believe when I was little, I pronounced it [goɹ-ʃan’-əs]. My parents must have said it this way, and the friends I made early in my life still pronounce it that way. But somewhere along the line I changed my pronunciation, relaxing that final s into a voiced z, and usually flattening the initial vowel o into another schwa, as in the first syllable of Gertrude. The pronunciations used today by my other family members run the gamut: [goɹ-ʃan’-əs], [goɹ-ʃan’-əz], [goɹ-ʧan’-əz]… Not to mention, the original pronunciation of my last name, back in its Goričanec days, was [go-ri-ʧa’-neʦ]. Predictably, I also find myself modulating my pronunciation when I know someone is trying to match up what I say with its written counterpart; in those cases, I pronounce it as phonetically as possible: [gor-ɪ-ʧan’-əz].

To repeat the question, is my last name one of these sound combinations, or is it indeed the string of letter Gorichanaz? In the case of other words, we typically assume the “real” word is its pronunciation. With a word like comfortable, for example, where the pronunciation doesn’t quite match up with the spelling, we’d say probably the “real” word is [kʌmf’-tər-bəl], rather than the pedantic (but admittedly still recognizable) [kʌm’-for-tə-bəl].

What, then, of the written version? Say you come across a word like nychthemeron, which, though English, has no corresponding pronunciation in your lexicon. Is it not a word? Perhaps, given the give-and-take relationship between spelling and writing, we should conclude that true words are more primordial than both speech and writing. After all, it is the idea behind the word that motivates its manifestation in the first place.

How related are writing and speech?

It can be difficult for us anglophones to imagine that speech and writing could have little or nothing to do with each other. After all, our writing system more or less corresponds with our pronunciation. But to illustrate the possibilities, let’s consider three historical cases: Latin around the year 800 AD, modern Arabic, and Chinese.

lolcantread

Latin
Because of the grand expanse of the Roman Empire (and the lack of universal education), it was inevitable that regional versions of Latin would emerge. These dialects didn’t have their own written versions; instead, scribes conserved the Classical way of writing. This created a split in what Latin was: a high Latin, which was primarily written (especially championed by the Church), and a low Latin, which was primarily spoken. In fact, the low variety had no written form. Everyday people spoke the low variety, and educated people could read and write in the high variety. Though these budding Romance languages were linguistically distinct from Classical Latin, they were all considered the same catholic Latin. The people were little bothered (at first anyway) that written Latin had little to do with spoken Latin; that’s just the way it was. As we know, the multiple versions of Latin eventually gave rise to what we know today as the Romance languages. Written Spanish, Portuguese, French, Italian and Romanian are nothing more than validated dialects of Latin.

Arabic
There are more varieties of Arabic than there are Arab nations. Moreover, not all speakers of Arabic dialects can understand each other. Notwithstanding, there’s only one way to write in Arabic: Egyptian Standard, the form used in the Koran.

Chinese
Every Chinese character represents a word or idea. What many people don’t realize is that there are many Chinese dialects, and some of them are not even linguistically related. This means that the character 耳, which means “ear,” is pronounced differently in different dialects. To take this further, consider that a sentence in written Chinese may be pronounced differently—and with a completely different grammatical system—depending on the dialect. Moreover, Korean and Japanese, which linguistically have nothing to do with Chinese, both have adopted some Chinese characters to represent their own languages.