I’m a few days late posting this (because reasons), but I’m very excited to post this interview I recently did with the folks on Conlangery, about the languages of Storm bride, and a bunch of other questions regarding fantasy literature and language creation. You can listen here:

We talk about:

  • How I got into linguistics and conlanging
  • Why there’s only two sentences of actual conlang in Storm Bride
  • Why most fictional languages in SFF suck
  • Ursula K. LeGuin’s Always Coming Home and other SFF works that at least try to get it right
  • How to present strange phonologies without terrifying your readers
  • Cultural appropriation issues when conlanging based on an real languages and cultures
  • Other stuff that I can’t remember.

This was the most fun I’d had in a while! Hope you like listening.

My wife always said that I should keep a log of some sort about my kids linguistic development. And while I haven’t kept a detailed log, here I am blogging about it for the second week in a row.

So: pronouns. As mentioned before, proper use of pronouns is something that children acquire late, but partial use of pronouns develops quite a bit before that. What’s interesting here is the differing rates at which English and Romanian pronouns have been acquired. Because Romanian is pro-drop, pronouns are relatively uncommon in Romanian speech. For this reason, Sebi already uses the English pronoun I fairly consistently, but has not acquired any Romanian pronouns at all. He even mixes the two languages:

I făcut caca.
I went poopy.

The only thing approaching a Romanian pronoun that either child uses is the syllable [tu:], which represents an interesting conflation of the Romanian pronoun tu (you, sg.) and the English word too. The reason for the conflation is that both English and Romanian tend to locate these words at the end of utterances, in similar contexts, and with both words bearing the prosodic stress:

Do you want some, **too**?
Vrei şi **tu**? (Lit. "Want also you?")

Because of this coincidence, both children use the syllable [tu:] with a variety of meanings, including "me, too," "also," and "let me do it." As I noted with the discussion of verb inflection, the kids tend to use second-person forms with first-person meanings, based on what they most often hear.

Despite these few examples of confusion between the two languages, the kids already seem to have a good understanding of the differences between the languages and the contexts in which each is used. Their teachers at preschool say that they never hear the boys using Romanian words at school, and at home they seem to switch effortlessly into Romanian. They have even begun to exhibit some awareness of translation, the notion of a statement in English having an equivalent in Romanian and vice-versa. I’d say that this portends good things.

My kids are young and are just beginning to speak in sentences. This in interesting, first because their linguistic development is bilingual, and secondly because I’ve never before had significant exposure to young children speaking a language other than English at this stage in their linguistic development. And now I’m getting a chance to observe something I’ve always wondered: when children learn a language with highly verbs, which forms do they learn first?

Romanian, like most Romance languages, inflects the verb for agreement in person and number in a variety of tenses. By way of illustration:

eu merg         I go
tu mergi        you go
el merge        he goes 
noi mergem      we go 
voi mergeţi     you (pl.) go 
ei merg         they go

Romanian is also pro-drop, meaning that pronouns are usually omitted when they occur as the subjects of sentences. So, given that a child learning Romanian hears a wide variety of verb forms with little way of distinguishing them at first (pronouns are one of the later syntactic features that small children acquire), which form do they use when they first start speaking?

The answer is: it depends, and it’s different for different verbs. This is not the answer that I was expecting. And what it depends on, as far as I can tell, is the form in which the child most often hears the verb, especially when its directed at him.

For example, my youngest always uses "want" in the 2sg form v(r)ei. This is because he most often hears the word used in questions like the following:

What do you want?
Ce vrei?

The word "give", on the other hand, is always dau, the 1sg form, because of the frequency of statements like "I’ll give you…" or "Do you want me to give you…?" (That second one involves an infinitive in English, but a finite verb in Romanian.)

A surprising number of verbs have been acquired in the imperative, most notably vino "come". The reasons for this should be obvious. Since the Romanian imperative is usually identical with the 3sg, there are a number of verbs in which it’s unclear which form has been acquired—and since the differences between imperative and indicative are certainly beyond him at this point, I doubt that the question is even answerable. He judges identifies imperatives solely by pragmatics, and he uses the imperative form even in clearly indicative contexts. Just today I heard Sebi vino used where the intended meaning was clearly "Sebi is coming", despite the morphologically imperative verb.

Most surprisingly, there’s a handful of verbs which have been acquired as a past participle, especially făcut "do" and dormit "sleep". This probably reflects the frequency with which these verbs are used in the past tense, since the normal Romanian past tense is synthetic, formed with the present-tense form of avea "to have" and the past participle. Sebi does sometimes use the auxiliary with the participle, but it seems very doubtful that he actually understands this as periphrasis at this point, as opposed to a fixed phrase.

Finally there is a single verb which has been acquired in the first person plural: rugăm "(we) pray". I find this adorable.

This is a small sample size (n=2), and only one language, but I would expect that results from other languages would be similar. What I wonder about, now, is how child language evolves in languages with really complex systems, such as polypersonal agreement, object incorporation, or other polysynthetic features.

This week’s entry in the Weird Linguistics category isn’t so much "weird" as "amazing". But I have to stick with the title I’ve got.

You are probably familiar with the Indo-European language family, the family to which most of the languages of Europe belong. Proto-Indo-European was originally the language of a semi-nomadic group on the steppes of modern-day Ukraine or Central Asia, who began a series of expansions some 7,000 years ago spurred by a series of technological advances— especially farming and the chariot. Their prehistoric expansion eventually brought them all the way to the Atlantic Ocean and the British Isles. In the east, they came to the northern part of India as part of the Aryan Invasion, and a far-flung group known as the Tocharians got all the way to Western China.

Indo-European Expansion

Indo-European Expansion

I find it astounding to consider that some random group of nomads managed to strike the cultural-linguistic jackpot, so that their descendants pushed all the way from Central Asia to eastern India and western Europe in prehistoric times—and to eventually dominate most of North and South America as well. I’m even more astounded by the fact that we can reconstruct this expansion from linguistic and archaeological data thousands of years after the fact.

But this is not even the most impressive pre-historical linguistic expansion that we know of, which brings me to my real point. The real champions of geographic expansion are not the Indo-Europeans, but the Austronesians.

Austronesian language dispersion

Austronesian language dispersion

The Austronesian languages include Hawaiian, Fijian, Tagalog, Malayan, Maori, and hundreds of other languages spoken throughout the Pacific and Madagascar. The Austronesians expanded from their original homeland on the island of Taiwan in a series of waves spaced throughout prehistory, but while the Indo-Europeans were going overland, the Austronesians were going over the sea. And boy did they get around: not only did they populate all of the islands of Polynesia, Micronesia, and the Philippines, but they also turned west and got all the way to Madagascar. This latter fact is tremendously surprising: Madagascar wasn’t settled primarily by Africans crossing the relatively narrow Mozambique Channel, but by Austronesians who had to cross the Indian Ocean from Borneo to get there.

This is, to me, far more impressive than the Indo-European expansion. No Austronesian society ever had hulled ships, but they still managed to navigate the vastness of the Pacific and cross the monsoon-wracked Indian Ocean centuries before any other civilization would attempt the same thing.

Yet even this is not the most far-ranging language family we know of. No, that distinction belongs to the Dené-Yeniseian languages.

Dené-Yeniseian language dispersion

Dené-Yeniseian language dispersion

The Indo-European languages are the ones that all English speakers are familiar with, and you’ve probably at least heard of several Austronesian languages. But chances are that you have never heard of even one of the Dené-Yeniseian tongues. Do you see that little green smudge in the middle of Siberia in the map above, just north of Mongolia? Those are the Yeniseian languages, a nearly-extinct family of languages spoken by the indigenous peoples of Central Asia. There are only six known languages in this family. Only two of them survived into the 20th century, and only one of them (Imbat Ket) is still alive today. But we are lucky that it has survived, because the evidence that we have of the languages has proven them to be the only known pre-historical linguistic link between the Old World and the New.

The American cousin of the Yeniseian languages is the Na-Dene language family, which comprises several branches found in Alaska, Canada, California, and the American Southwest. The dispersal of this branch is something of a story in itself, with Na-Dene speakers occupying a large continuous area in the northernmost part of the Americas, but with distant relatives much further south. This southwestern branch contains the most famous tribes of this family: Navajo and Apache are Na-Dene languages, and these languages are the only ones whose names might be familiar to the average English speaker.

The distance from the heart of Siberia to southwestern America is even greater than the distances covered by the Austronesians. Yet while we understand the history and the expansion of the Austronesians and the Indo-Europeans very well, the Dené-Yeniseian languages mostly present us with mysteries. No one knows where their original homeland was. No one knows the motive for their expansion, or if it can even be called an expansion. We don’t know how or when the Proto-Yeniseians crossed from Siberia into America, and we don’t know why the American branch of the family is split into such distant northern, southern, and western lines. There are conjectures and guesses about all of these things, but precious little that we can identify as fact.

Nonetheless, I find that this linguistic relationship surprises me more than any other. It’s one thing to know, abstractly, that the Americas were populated from Asia at some point in the distant past. It’s quite something else to boil that fact down into a set of cognates, and to be able to say with some certainty that these two languages separated by thousands of miles of ocean and ice in fact sprang from the same ancestral tongue. It’s the most amazing thing in linguistics.

This being the second post in my series about weird linguistics, I’d like to point out that it’s not necessary to travel to strange and exotic climes in order to find bizarre grammatical features like ergativity. English itself is plenty weird. Today I’ll demonstrate this by discussing negative polarity.

English is a simple language in many respects. We have barely any case system to speak of, minimal verbal morphology, and a simple consonant phonology. Our vowels are a little baroque, and our spelling is awful, but overall it’s not too bad. But there’s one thing that we’ve managed to muck up pretty badly, and it’s something so simple that in many languages you never have to think about it at all.

I’m talking about not doing things. Or, as we linguists like to call it, negation.

First, observe how a negative sentence in English is formed:

  1. I eat fried octopus.
  2. I don’t eat fried octopus.

The English negative adverb is not, but of course you can’t just add not to an affirmative sentence. Instead, you have to have do-support, where the word do gets thrown in there just so that not (the lazy bum) has something to lean on. Unless, of course, there’s a modal verb or other auxiliary verb floating around, or a few other conditions apply. It’s a bit of a mess.

This is simple stuff, though. What I really want to talk about today is something even more pernicious called negative polarity. In English we have some words which are not themselves negative words, but which normally only occur when there is some other negative word in the sentence. These words are called negative polarity items (NPI). Let me crib some examples from Wikipedia:

  1. I didn’t like the film at all.
  2. *I liked the film at all.
  3. John doesn’t have any potatoes
  4. *John has any potatoes.

(In linguistic literature, the asterisk is used to indicate ungrammatical utterances.)

The thing to notice here is that the NPI’s at all and any are not themselves words that convey negation. Nonetheless, those words are only allowed to occur in sentences which are negated with not, while examples (4) and (6) are ungrammatical for attempting to use those words in a positive context.

The rules for NPI can get really complex. For example, it is widely believed that they can only occur in downward-entailing contexts, though to explain some additional properties of NPI’s a more robust notion of veridicality is required. For example, NPI’s are allowed in questions, even if the questions are not negative:

  1. Did you see anything?
  2. Do they have any octopus?

And they can occur when qualified by adverbs such as hardly:

  1. They cook hardly any seafood right now.

Sentence (9) above becomes ungrammatical if hardly is removed, yet somehow it becomes grammatical again if the sentence is qualified differently:

  1. *They cook any seafood right now.
  2. They cook any seafood that you can catch.

The reasons for this have to do with the veridical interpretation of habitual aspects and future time… which is all that I will attempt to explain about that. Read the linked article above about veridicality if you’re interested.

Now as a native English speaker you do all of this intuitively, and so you never have to spend a moment’s conscious thought on downward entailment and veridicality. (Lucky for you.) You can even invent new negative polarity items on occasion. But next time it comes up that a non-native speaker uses an NPI incorrectly, do be nice to her. This stuff is harder than it looks.

A long, rambling introduction.

A few weeks ago I released phonix 0.8, the latest version of my phonological modeling language. And now that you’ve read the previous sentence, assuming you haven’t already clicked away in boredom, I hear you saying What the heck is a phonological modeling language?

Let me explain. No, let me sum up.

Languages change their sounds over time: Spanish and French have different sounds than Latin, and different sounds from each other. However, there are regular correspondences between the Latin sounds in a word and the resulting sounds in Spanish, and with a good set of rules you can generate Spanish words from Latin ones. However, to do this you need a model of the sounds in Latin and how they relate to each other, and a set of rules that describes how those sounds change over time and what the conditions are for turning one sound into another. This is what phonix does: it defines a special notation for describing a language’s sound system and the rules which apply to that system, then it allows you to apply those rules to lists of words.

All of this demonstrates that I’m a huge language nerd. I majored in Linguistics in college, and I was (as one under-motivated classmate said) "one of those people who reads linguistics books in their spare time". As a language nerd I spend an inordinate amount of time thinking about the languages used in my stories. And sometimes I think I’m the only one, since most fantasy and science fiction writing sucks a big one one this.

I feel a rant coming on.

Common failure modes of language in SF

Here they are, in decreasing order of fail:

  1. There are only two languages, the Modern and Old-Timey. Everyone speaks Modern, and no one can understand Old-Timey except maybe for the wizard. Guilty of this: Robert Jordan.

The problem with this model is that if the language has changed enough that the older form is incomprehensible, then unless the language community is very small the language should also have split into multiple daughter languages.

  1. There is one language for every country on the map. They are all obvious knock-offs of some familiar language in this world. Nonetheless, the protagonist never meets anyone that he can’t speak with. Guilty of this: Tad Williams.

This is a lot better than option #1, but it contains the problematic assumption that countries only contain one language. Americans, in particular, seem to fall into this assumption because we’re used to our vast, linguistically homogeneous country. But the majority of the countries in the world are home to multiple, mutually unintelligible language groups, and often dozens or hundreds of such groups. In a pre-modern setting, our protagonist should fine that the local vernacular becomes incomprehensible as soon as he’s traveled more than a few days from his house.

  1. There are multiple languages, but there is one common language that everyone speaks, so let’s just use that and keep all of the other languages out of it. Guilty of this: J.R.R. Tolkein.

This is tolerable, and it’s this approach that’s taken by Tolkein and those of his followers who bothered to care. There is often something of a handwave to this explanation—the author has posited this in order to avoid having to actually think about the languages in their setting too deeply—but at least it’s superficially plausible and has historical precedent.

  1. OMG SO MANY LANGUAGES. There are lots of languages, and they all have a distinctive phonology, syntax, and vocabulary. The historical relationships between the languages are well-documented and understood. They have their own writing systems. Really, there’s far more information about the languages of this world than anyone could reasonably hope to assimilate.

I actually don’t know any published authors in this category. Mark Rosenfeld has done amazing work in documenting his world of Almea, but alas he’s never been published. Such is the fate of many a conlanger.

So now you’re depressed. Your options are to write about language in your setting badly, or to spend years and years elaborating something that most readers don’t care about anyway.

There is one more option.

Pretend that it doesn’t exist. I read an interesting article the other day about how language is handled—or, more accurately, isn’t handled—in the Magic: the Gathering tie-in novels. (Scroll down to the "Letter of the Week" to see the discussion.) A letter asked how characters from different planes of the Multiverse can talk to each other without needing to learn a foreign language, and the author responded quite directly with "just ignore that":

The risk is over-explaining. To use a Star Trek example again, this time in a negative way—it’s like the episode where they explain why all the humanoid races on the show all basically look alike. Ugh. It’s one thing to poke fun at the show’s makeup budget and do armchair xenobiological critiques of how the aliens resemble each other so much, but it’s quite another to expect the show to provide an in-universe explanation of those budgetary or story-based limitations. Either you didn’t think it was a problem before and now this explanation throws an awkward spotlight on it, which diminishes your enjoyment of the formula, or you did think it was a problem but you had learned to live with it but now suddenly you have to live with the show’s one groan-worthy and set-in-stone explanation forever.

An explanation like "Well, everybody across the Multiverse happens to speak the same language because a long time ago blahblahblah" or "Well, all planeswalkers find that they can communicate just fine because the spark blahblahblah" may ultimately cause more problems than it fixes. It might actually reduce enjoyment to patch over one of those weird, load-bearing plot-holes that are kinda ugly but that make the fantasy genre possible.

Though it makes me want to cry a little, this guy has a really good point. No amount of world-building will cover everything. And if you don’t have the skills or the patience to make the languages, why bother? More importantly, if your story doesn’t need the linguistic detail, then maybe you should just leave it out.

Not me, though.

We all have our obsessions. I’ll be using phonix to apply the sound changes for deriving Prasi from Old Tzingrizil. And I’ll be having a great time of it, even if no-one cares.

This time with actual Buddhism!

Japanese Buddhist monks were not allowed to eat any meat other than birds, but liked rabbit meat so much they came up with the contrived “explanation” that rabbits are actually birds, and that their ears are unusable wings. The rationale was that while moving, ther rabbits touched ground only with two feet at a time.