I’m a few days late posting this (because reasons), but I’m very excited to post this interview I recently did with the folks on Conlangery, about the languages of Storm bride, and a bunch of other questions regarding fantasy literature and language creation. You can listen here:

We talk about:

  • How I got into linguistics and conlanging
  • Why there’s only two sentences of actual conlang in Storm Bride
  • Why most fictional languages in SFF suck
  • Ursula K. LeGuin’s Always Coming Home and other SFF works that at least try to get it right
  • How to present strange phonologies without terrifying your readers
  • Cultural appropriation issues when conlanging based on an real languages and cultures
  • Other stuff that I can’t remember.

This was the most fun I’d had in a while! Hope you like listening.

My wife always said that I should keep a log of some sort about my kids linguistic development. And while I haven’t kept a detailed log, here I am blogging about it for the second week in a row.

So: pronouns. As mentioned before, proper use of pronouns is something that children acquire late, but partial use of pronouns develops quite a bit before that. What’s interesting here is the differing rates at which English and Romanian pronouns have been acquired. Because Romanian is pro-drop, pronouns are relatively uncommon in Romanian speech. For this reason, Sebi already uses the English pronoun I fairly consistently, but has not acquired any Romanian pronouns at all. He even mixes the two languages:

I făcut caca.
I went poopy.

The only thing approaching a Romanian pronoun that either child uses is the syllable [tu:], which represents an interesting conflation of the Romanian pronoun tu (you, sg.) and the English word too. The reason for the conflation is that both English and Romanian tend to locate these words at the end of utterances, in similar contexts, and with both words bearing the prosodic stress:

Do you want some, **too**?
Vrei şi **tu**? (Lit. "Want also you?")

Because of this coincidence, both children use the syllable [tu:] with a variety of meanings, including "me, too," "also," and "let me do it." As I noted with the discussion of verb inflection, the kids tend to use second-person forms with first-person meanings, based on what they most often hear.

Despite these few examples of confusion between the two languages, the kids already seem to have a good understanding of the differences between the languages and the contexts in which each is used. Their teachers at preschool say that they never hear the boys using Romanian words at school, and at home they seem to switch effortlessly into Romanian. They have even begun to exhibit some awareness of translation, the notion of a statement in English having an equivalent in Romanian and vice-versa. I’d say that this portends good things.

My kids are young and are just beginning to speak in sentences. This in interesting, first because their linguistic development is bilingual, and secondly because I’ve never before had significant exposure to young children speaking a language other than English at this stage in their linguistic development. And now I’m getting a chance to observe something I’ve always wondered: when children learn a language with highly verbs, which forms do they learn first?

Romanian, like most Romance languages, inflects the verb for agreement in person and number in a variety of tenses. By way of illustration:

eu merg         I go
tu mergi        you go
el merge        he goes 
noi mergem      we go 
voi mergeţi     you (pl.) go 
ei merg         they go

Romanian is also pro-drop, meaning that pronouns are usually omitted when they occur as the subjects of sentences. So, given that a child learning Romanian hears a wide variety of verb forms with little way of distinguishing them at first (pronouns are one of the later syntactic features that small children acquire), which form do they use when they first start speaking?

The answer is: it depends, and it’s different for different verbs. This is not the answer that I was expecting. And what it depends on, as far as I can tell, is the form in which the child most often hears the verb, especially when its directed at him.

For example, my youngest always uses "want" in the 2sg form v(r)ei. This is because he most often hears the word used in questions like the following:

What do you want?
Ce vrei?

The word "give", on the other hand, is always dau, the 1sg form, because of the frequency of statements like "I’ll give you…" or "Do you want me to give you…?" (That second one involves an infinitive in English, but a finite verb in Romanian.)

A surprising number of verbs have been acquired in the imperative, most notably vino "come". The reasons for this should be obvious. Since the Romanian imperative is usually identical with the 3sg, there are a number of verbs in which it’s unclear which form has been acquired—and since the differences between imperative and indicative are certainly beyond him at this point, I doubt that the question is even answerable. He judges identifies imperatives solely by pragmatics, and he uses the imperative form even in clearly indicative contexts. Just today I heard Sebi vino used where the intended meaning was clearly "Sebi is coming", despite the morphologically imperative verb.

Most surprisingly, there’s a handful of verbs which have been acquired as a past participle, especially făcut "do" and dormit "sleep". This probably reflects the frequency with which these verbs are used in the past tense, since the normal Romanian past tense is synthetic, formed with the present-tense form of avea "to have" and the past participle. Sebi does sometimes use the auxiliary with the participle, but it seems very doubtful that he actually understands this as periphrasis at this point, as opposed to a fixed phrase.

Finally there is a single verb which has been acquired in the first person plural: rugăm "(we) pray". I find this adorable.

This is a small sample size (n=2), and only one language, but I would expect that results from other languages would be similar. What I wonder about, now, is how child language evolves in languages with really complex systems, such as polypersonal agreement, object incorporation, or other polysynthetic features.

This week’s entry in the Weird Linguistics category isn’t so much "weird" as "amazing". But I have to stick with the title I’ve got.

You are probably familiar with the Indo-European language family, the family to which most of the languages of Europe belong. Proto-Indo-European was originally the language of a semi-nomadic group on the steppes of modern-day Ukraine or Central Asia, who began a series of expansions some 7,000 years ago spurred by a series of technological advances— especially farming and the chariot. Their prehistoric expansion eventually brought them all the way to the Atlantic Ocean and the British Isles. In the east, they came to the northern part of India as part of the Aryan Invasion, and a far-flung group known as the Tocharians got all the way to Western China.

Indo-European Expansion

Indo-European Expansion

I find it astounding to consider that some random group of nomads managed to strike the cultural-linguistic jackpot, so that their descendants pushed all the way from Central Asia to eastern India and western Europe in prehistoric times—and to eventually dominate most of North and South America as well. I’m even more astounded by the fact that we can reconstruct this expansion from linguistic and archaeological data thousands of years after the fact.

But this is not even the most impressive pre-historical linguistic expansion that we know of, which brings me to my real point. The real champions of geographic expansion are not the Indo-Europeans, but the Austronesians.

Austronesian language dispersion

Austronesian language dispersion

The Austronesian languages include Hawaiian, Fijian, Tagalog, Malayan, Maori, and hundreds of other languages spoken throughout the Pacific and Madagascar. The Austronesians expanded from their original homeland on the island of Taiwan in a series of waves spaced throughout prehistory, but while the Indo-Europeans were going overland, the Austronesians were going over the sea. And boy did they get around: not only did they populate all of the islands of Polynesia, Micronesia, and the Philippines, but they also turned west and got all the way to Madagascar. This latter fact is tremendously surprising: Madagascar wasn’t settled primarily by Africans crossing the relatively narrow Mozambique Channel, but by Austronesians who had to cross the Indian Ocean from Borneo to get there.

This is, to me, far more impressive than the Indo-European expansion. No Austronesian society ever had hulled ships, but they still managed to navigate the vastness of the Pacific and cross the monsoon-wracked Indian Ocean centuries before any other civilization would attempt the same thing.

Yet even this is not the most far-ranging language family we know of. No, that distinction belongs to the Dené-Yeniseian languages.

Dené-Yeniseian language dispersion

Dené-Yeniseian language dispersion

The Indo-European languages are the ones that all English speakers are familiar with, and you’ve probably at least heard of several Austronesian languages. But chances are that you have never heard of even one of the Dené-Yeniseian tongues. Do you see that little green smudge in the middle of Siberia in the map above, just north of Mongolia? Those are the Yeniseian languages, a nearly-extinct family of languages spoken by the indigenous peoples of Central Asia. There are only six known languages in this family. Only two of them survived into the 20th century, and only one of them (Imbat Ket) is still alive today. But we are lucky that it has survived, because the evidence that we have of the languages has proven them to be the only known pre-historical linguistic link between the Old World and the New.

The American cousin of the Yeniseian languages is the Na-Dene language family, which comprises several branches found in Alaska, Canada, California, and the American Southwest. The dispersal of this branch is something of a story in itself, with Na-Dene speakers occupying a large continuous area in the northernmost part of the Americas, but with distant relatives much further south. This southwestern branch contains the most famous tribes of this family: Navajo and Apache are Na-Dene languages, and these languages are the only ones whose names might be familiar to the average English speaker.

The distance from the heart of Siberia to southwestern America is even greater than the distances covered by the Austronesians. Yet while we understand the history and the expansion of the Austronesians and the Indo-Europeans very well, the Dené-Yeniseian languages mostly present us with mysteries. No one knows where their original homeland was. No one knows the motive for their expansion, or if it can even be called an expansion. We don’t know how or when the Proto-Yeniseians crossed from Siberia into America, and we don’t know why the American branch of the family is split into such distant northern, southern, and western lines. There are conjectures and guesses about all of these things, but precious little that we can identify as fact.

Nonetheless, I find that this linguistic relationship surprises me more than any other. It’s one thing to know, abstractly, that the Americas were populated from Asia at some point in the distant past. It’s quite something else to boil that fact down into a set of cognates, and to be able to say with some certainty that these two languages separated by thousands of miles of ocean and ice in fact sprang from the same ancestral tongue. It’s the most amazing thing in linguistics.

This being the second post in my series about weird linguistics, I’d like to point out that it’s not necessary to travel to strange and exotic climes in order to find bizarre grammatical features like ergativity. English itself is plenty weird. Today I’ll demonstrate this by discussing negative polarity.

English is a simple language in many respects. We have barely any case system to speak of, minimal verbal morphology, and a simple consonant phonology. Our vowels are a little baroque, and our spelling is awful, but overall it’s not too bad. But there’s one thing that we’ve managed to muck up pretty badly, and it’s something so simple that in many languages you never have to think about it at all.

I’m talking about not doing things. Or, as we linguists like to call it, negation.

First, observe how a negative sentence in English is formed:

  1. I eat fried octopus.
  2. I don’t eat fried octopus.

The English negative adverb is not, but of course you can’t just add not to an affirmative sentence. Instead, you have to have do-support, where the word do gets thrown in there just so that not (the lazy bum) has something to lean on. Unless, of course, there’s a modal verb or other auxiliary verb floating around, or a few other conditions apply. It’s a bit of a mess.

This is simple stuff, though. What I really want to talk about today is something even more pernicious called negative polarity. In English we have some words which are not themselves negative words, but which normally only occur when there is some other negative word in the sentence. These words are called negative polarity items (NPI). Let me crib some examples from Wikipedia:

  1. I didn’t like the film at all.
  2. *I liked the film at all.
  3. John doesn’t have any potatoes
  4. *John has any potatoes.

(In linguistic literature, the asterisk is used to indicate ungrammatical utterances.)

The thing to notice here is that the NPI’s at all and any are not themselves words that convey negation. Nonetheless, those words are only allowed to occur in sentences which are negated with not, while examples (4) and (6) are ungrammatical for attempting to use those words in a positive context.

The rules for NPI can get really complex. For example, it is widely believed that they can only occur in downward-entailing contexts, though to explain some additional properties of NPI’s a more robust notion of veridicality is required. For example, NPI’s are allowed in questions, even if the questions are not negative:

  1. Did you see anything?
  2. Do they have any octopus?

And they can occur when qualified by adverbs such as hardly:

  1. They cook hardly any seafood right now.

Sentence (9) above becomes ungrammatical if hardly is removed, yet somehow it becomes grammatical again if the sentence is qualified differently:

  1. *They cook any seafood right now.
  2. They cook any seafood that you can catch.

The reasons for this have to do with the veridical interpretation of habitual aspects and future time… which is all that I will attempt to explain about that. Read the linked article above about veridicality if you’re interested.

Now as a native English speaker you do all of this intuitively, and so you never have to spend a moment’s conscious thought on downward entailment and veridicality. (Lucky for you.) You can even invent new negative polarity items on occasion. But next time it comes up that a non-native speaker uses an NPI incorrectly, do be nice to her. This stuff is harder than it looks.

A long, rambling introduction.

A few weeks ago I released phonix 0.8, the latest version of my phonological modeling language. And now that you’ve read the previous sentence, assuming you haven’t already clicked away in boredom, I hear you saying What the heck is a phonological modeling language?

Let me explain. No, let me sum up.

Languages change their sounds over time: Spanish and French have different sounds than Latin, and different sounds from each other. However, there are regular correspondences between the Latin sounds in a word and the resulting sounds in Spanish, and with a good set of rules you can generate Spanish words from Latin ones. However, to do this you need a model of the sounds in Latin and how they relate to each other, and a set of rules that describes how those sounds change over time and what the conditions are for turning one sound into another. This is what phonix does: it defines a special notation for describing a language’s sound system and the rules which apply to that system, then it allows you to apply those rules to lists of words.

All of this demonstrates that I’m a huge language nerd. I majored in Linguistics in college, and I was (as one under-motivated classmate said) "one of those people who reads linguistics books in their spare time". As a language nerd I spend an inordinate amount of time thinking about the languages used in my stories. And sometimes I think I’m the only one, since most fantasy and science fiction writing sucks a big one one this.

I feel a rant coming on.

Common failure modes of language in SF

Here they are, in decreasing order of fail:

  1. There are only two languages, the Modern and Old-Timey. Everyone speaks Modern, and no one can understand Old-Timey except maybe for the wizard. Guilty of this: Robert Jordan.

The problem with this model is that if the language has changed enough that the older form is incomprehensible, then unless the language community is very small the language should also have split into multiple daughter languages.

  1. There is one language for every country on the map. They are all obvious knock-offs of some familiar language in this world. Nonetheless, the protagonist never meets anyone that he can’t speak with. Guilty of this: Tad Williams.

This is a lot better than option #1, but it contains the problematic assumption that countries only contain one language. Americans, in particular, seem to fall into this assumption because we’re used to our vast, linguistically homogeneous country. But the majority of the countries in the world are home to multiple, mutually unintelligible language groups, and often dozens or hundreds of such groups. In a pre-modern setting, our protagonist should fine that the local vernacular becomes incomprehensible as soon as he’s traveled more than a few days from his house.

  1. There are multiple languages, but there is one common language that everyone speaks, so let’s just use that and keep all of the other languages out of it. Guilty of this: J.R.R. Tolkein.

This is tolerable, and it’s this approach that’s taken by Tolkein and those of his followers who bothered to care. There is often something of a handwave to this explanation—the author has posited this in order to avoid having to actually think about the languages in their setting too deeply—but at least it’s superficially plausible and has historical precedent.

  1. OMG SO MANY LANGUAGES. There are lots of languages, and they all have a distinctive phonology, syntax, and vocabulary. The historical relationships between the languages are well-documented and understood. They have their own writing systems. Really, there’s far more information about the languages of this world than anyone could reasonably hope to assimilate.

I actually don’t know any published authors in this category. Mark Rosenfeld has done amazing work in documenting his world of Almea, but alas he’s never been published. Such is the fate of many a conlanger.

So now you’re depressed. Your options are to write about language in your setting badly, or to spend years and years elaborating something that most readers don’t care about anyway.

There is one more option.

Pretend that it doesn’t exist. I read an interesting article the other day about how language is handled—or, more accurately, isn’t handled—in the Magic: the Gathering tie-in novels. (Scroll down to the "Letter of the Week" to see the discussion.) A letter asked how characters from different planes of the Multiverse can talk to each other without needing to learn a foreign language, and the author responded quite directly with "just ignore that":

The risk is over-explaining. To use a Star Trek example again, this time in a negative way—it’s like the episode where they explain why all the humanoid races on the show all basically look alike. Ugh. It’s one thing to poke fun at the show’s makeup budget and do armchair xenobiological critiques of how the aliens resemble each other so much, but it’s quite another to expect the show to provide an in-universe explanation of those budgetary or story-based limitations. Either you didn’t think it was a problem before and now this explanation throws an awkward spotlight on it, which diminishes your enjoyment of the formula, or you did think it was a problem but you had learned to live with it but now suddenly you have to live with the show’s one groan-worthy and set-in-stone explanation forever.

An explanation like "Well, everybody across the Multiverse happens to speak the same language because a long time ago blahblahblah" or "Well, all planeswalkers find that they can communicate just fine because the spark blahblahblah" may ultimately cause more problems than it fixes. It might actually reduce enjoyment to patch over one of those weird, load-bearing plot-holes that are kinda ugly but that make the fantasy genre possible.

Though it makes me want to cry a little, this guy has a really good point. No amount of world-building will cover everything. And if you don’t have the skills or the patience to make the languages, why bother? More importantly, if your story doesn’t need the linguistic detail, then maybe you should just leave it out.

Not me, though.

We all have our obsessions. I’ll be using phonix to apply the sound changes for deriving Prasi from Old Tzingrizil. And I’ll be having a great time of it, even if no-one cares.

This time with actual Buddhism!

Japanese Buddhist monks were not allowed to eat any meat other than birds, but liked rabbit meat so much they came up with the contrived “explanation” that rabbits are actually birds, and that their ears are unusable wings. The rationale was that while moving, ther rabbits touched ground only with two feet at a time.


I was recently given the link to Dracula, which is a presentation of Bram Stoker’s Dracula in blog format. The original novel is an epistolary, with every section dated, and the novel is being posted section-by-section on the appropriate dates. It’s a delightful way to read.

The first post was on May 3, which I just read and experienced the distinct pleasure of being in or very near to the places that the author describes. The first paragraph that caught my mind was this one:

Having had some time at my disposal when in London, I had visited the British Museum, and made search among the books and maps in the library regarding Transylvania; it had struck me that some foreknowledge of the country could hardly fail to have some importance in dealing with a nobleman of that country. I find that the district he named is in the extreme east of the country, just on the borders of three states, Transylvania, Moldavia and Bukovina, in the midst of the Carpathian mountains; one of the wildest and least known portions of Europe. (emphasis mine)

The city where I’m typing this is in the region of Bucovina (to use the modern spelling), but it’s very near the old border of Moldavia. In fact, people from other parts of the country will usually tell you that we are part of Moldavia, though the locals try to associate with Bucovina since Moldavians are stereotyped as backward hicks. It’s complicated by the fact that there are not now any official political entities with those names, and the historical regions that they represent had very flexible borders.

I was not able to light on any map or work giving the exact locality of the Castle Dracula, as there are no maps of this country as yet to compare with our own Ordnance Survey maps; but I found that Bistritz, the post town named by Count Dracula, is a fairly well-known place.

That would be Bistriţa, a city not far from here that I’ve also passed through. This gives me a strange sense of dislocation while reading, because the geography that Stoker presents is meant to seem remote and exotic–but for me Bistriţa is a fairly boring city a few hours away by train.

In the population of Transylvania there are four district nationalities: Saxons in the South, and mixed with them the Wallachs, who are descendants of the Dacians; Magyars in the West, and Szekelys in the East and North.

Time for some ethnography! The Saxons are known on Romanian as saşi, and they still exist in Transylvania, though in much reduced numbers. The “Wallachs” are what we would consider the native Romanians, descended from the Romanized inhabitants of ancient Dacia after the Romans conquered the province. The Szekely are a Hungarian-speaking people known to Romanians as secui, who still exist in considerable numbers in the western parts of Romania (which the author refers to as the east, coming as he does from further west).

Linguistic aside: the etymon *walah is a fascinating one, as it has been borrowed from one language to another all over Europe, its meaning changing several times along the way, but always with the meaning “those funny people over there who don’t speak proper”. In English it provides the root for Wales and Welsh, and also Walloon (a name for some dialects of Dutch). In Germany it referred to any Romance-speaking peoples, and I believe provides the word for “Italian” in some dialects. In Slavic languages it usually refers to Romanians, but to Romanians themselves it refers to the Romanian peoples living outside of Romania, the Aromanians, Meglo-Romanians, and Istro-Romanians. Which just goes to show that everyone needs a word for “those funny people over there who don’t speak proper”.

I had for breakfast more paprika, and sort of porridge of maize flour which they said was “mamaliga,” and eggplant stuffed with forcemeat, a very excellent dish, which they call “implelata.” (Mem., get recipe for this also.)

The maize porridge is properly mămăligă, staple dish throughout Romania. I had some last night, in fact. It’s very similar to polenta as served in the American south. I can’t figure out what Romanian word “implelata” is supposed to refer to, and no one else in the house does, either. It’s not a dish that I’m familiar with, but here’s a Romanian cooking site with a recipe matching the discription. If you want to follow the author’s suggestion and get a recipe, you can probably follow just based on the pictures, and there’s always Google Translate.

At every station there were groups of people, sometimes crowds, and in all sorts of attire. Some of them were just like the peasants at home or those I saw coming through France and Germany, with short jackets and round hats and homemade trousers; but others were very picturesque. The women looked pretty; except when you got near them, but they were very clumsy about the waist. They had all full white sleeves of some kind or other, and most of them had big belts with a lot of strips of something fluttering from them like the dresses in a ballet, but of course there were petticoats under them.

This is actually a pretty good description of traditional Romanian dress. But this being the internet, I can just show you a picture:

Traditional Romanian Dress

This isn’t exactly a common sight on the street these days, but it would be familiar to anyone who’s spent significant time in Romania.

The moral of the story is: if you set your story in a strange, exotic place, people who actually live in that place will not find it as strange and exotic. If you care.

Mark Liberman has a post up at Language Log discussing Ian M. Banks’ Culture novels, and in particular his “upper case phoneme”.

I’m a fan of Ian M. Banks’ Culture novels, but I’d like to suggest, respectfully, that they might be improved in their approach to matters linguistic. As an example, on p. 470 of his recently-released novel Matter, we learn that “Marain, the Culture’s language, had a phoneme to denote upper case”.

Linguists would usually call a unit that denotes something a morpheme (or perhaps a word), not a phoneme, even if it was only one phoneme long. (In fact, we sometimes find meaningful units whose effect on pronunciation is just a single feature.)

In addition, it’s odd to find a morpheme that signals something essentially in the realm of writing, like alphabetic case; and also to find that Marain still uses upper case in (some of) the same ways that English does.

I’d like to suggest, respectfully, that Liberman is being way too nice. The quoted passage from the book makes it pretty clear that what Banks means: the Marain language has the ability to indicate aurally that something is a proper name or otherwise an Important Word. But Banks calling this an “upper-case phoneme” is a basic mistake on two levels. First, he seems to have confused phonemes and morphemes, and second, he has confused a property of written language with spoken language. Liberman suggests a few interpretations of “upper-case phoneme” that would be linguistically defensible, but they’re increasingly implausible. No, what we have here is the linguistic equivalent of making the Kessel Run in less than twelve parsecs: an absurdity brought on by the fact that the writer didn’t know what he was talking about.

Of course, none of this really matters, and my irritation is, I’m sure, tiny compared to the irritation of a physicist trying to watch Star Trek. But it would be nice if people using linguistic vocabulary would at least try to get it right.

Here’s something I never knew:

Galland did more than merely translate: he shaped the text into what became a more or less canonical form; as a result the Nights are as much a part of Western literature as of Arabic. To Western readers, the stories of Aladdin, Ali Baba and Sindbad belong to the core of the Nights and are among the best-known tales; but they did not belong to the Arabic text until Galland added them. There is, in fact, no known Arabic text of the Aladdin and Ali Baba stories that predates Galland, and elements in the story of Aladdin suggest that it may have been a European fairy tale rather than an Arabic one.

This is via Language Hat. Click through to the original article for more fascinating tidbits.