How to fail at making a language

A long, rambling introduction.

A few weeks ago I released phonix 0.8, the latest version of my phonological modeling language. And now that you’ve read the previous sentence, assuming you haven’t already clicked away in boredom, I hear you saying What the heck is a phonological modeling language?

Let me explain. No, let me sum up.

Languages change their sounds over time: Spanish and French have different sounds than Latin, and different sounds from each other. However, there are regular correspondences between the Latin sounds in a word and the resulting sounds in Spanish, and with a good set of rules you can generate Spanish words from Latin ones. However, to do this you need a model of the sounds in Latin and how they relate to each other, and a set of rules that describes how those sounds change over time and what the conditions are for turning one sound into another. This is what phonix does: it defines a special notation for describing a language’s sound system and the rules which apply to that system, then it allows you to apply those rules to lists of words.

All of this demonstrates that I’m a huge language nerd. I majored in Linguistics in college, and I was (as one under-motivated classmate said) "one of those people who reads linguistics books in their spare time". As a language nerd I spend an inordinate amount of time thinking about the languages used in my stories. And sometimes I think I’m the only one, since most fantasy and science fiction writing sucks a big one one this.

I feel a rant coming on.

Common failure modes of language in SF

Here they are, in decreasing order of fail:

  1. There are only two languages, the Modern and Old-Timey. Everyone speaks Modern, and no one can understand Old-Timey except maybe for the wizard. Guilty of this: Robert Jordan.

The problem with this model is that if the language has changed enough that the older form is incomprehensible, then unless the language community is very small the language should also have split into multiple daughter languages.

  1. There is one language for every country on the map. They are all obvious knock-offs of some familiar language in this world. Nonetheless, the protagonist never meets anyone that he can’t speak with. Guilty of this: Tad Williams.

This is a lot better than option #1, but it contains the problematic assumption that countries only contain one language. Americans, in particular, seem to fall into this assumption because we’re used to our vast, linguistically homogeneous country. But the majority of the countries in the world are home to multiple, mutually unintelligible language groups, and often dozens or hundreds of such groups. In a pre-modern setting, our protagonist should fine that the local vernacular becomes incomprehensible as soon as he’s traveled more than a few days from his house.

  1. There are multiple languages, but there is one common language that everyone speaks, so let’s just use that and keep all of the other languages out of it. Guilty of this: J.R.R. Tolkein.

This is tolerable, and it’s this approach that’s taken by Tolkein and those of his followers who bothered to care. There is often something of a handwave to this explanation—the author has posited this in order to avoid having to actually think about the languages in their setting too deeply—but at least it’s superficially plausible and has historical precedent.

  1. OMG SO MANY LANGUAGES. There are lots of languages, and they all have a distinctive phonology, syntax, and vocabulary. The historical relationships between the languages are well-documented and understood. They have their own writing systems. Really, there’s far more information about the languages of this world than anyone could reasonably hope to assimilate.

I actually don’t know any published authors in this category. Mark Rosenfeld has done amazing work in documenting his world of Almea, but alas he’s never been published. Such is the fate of many a conlanger.

So now you’re depressed. Your options are to write about language in your setting badly, or to spend years and years elaborating something that most readers don’t care about anyway.

There is one more option.

Pretend that it doesn’t exist. I read an interesting article the other day about how language is handled—or, more accurately, isn’t handled—in the Magic: the Gathering tie-in novels. (Scroll down to the "Letter of the Week" to see the discussion.) A letter asked how characters from different planes of the Multiverse can talk to each other without needing to learn a foreign language, and the author responded quite directly with "just ignore that":

The risk is over-explaining. To use a Star Trek example again, this time in a negative way—it’s like the episode where they explain why all the humanoid races on the show all basically look alike. Ugh. It’s one thing to poke fun at the show’s makeup budget and do armchair xenobiological critiques of how the aliens resemble each other so much, but it’s quite another to expect the show to provide an in-universe explanation of those budgetary or story-based limitations. Either you didn’t think it was a problem before and now this explanation throws an awkward spotlight on it, which diminishes your enjoyment of the formula, or you did think it was a problem but you had learned to live with it but now suddenly you have to live with the show’s one groan-worthy and set-in-stone explanation forever.

An explanation like "Well, everybody across the Multiverse happens to speak the same language because a long time ago blahblahblah" or "Well, all planeswalkers find that they can communicate just fine because the spark blahblahblah" may ultimately cause more problems than it fixes. It might actually reduce enjoyment to patch over one of those weird, load-bearing plot-holes that are kinda ugly but that make the fantasy genre possible.

Though it makes me want to cry a little, this guy has a really good point. No amount of world-building will cover everything. And if you don’t have the skills or the patience to make the languages, why bother? More importantly, if your story doesn’t need the linguistic detail, then maybe you should just leave it out.

Not me, though.

We all have our obsessions. I’ll be using phonix to apply the sound changes for deriving Prasi from Old Tzingrizil. And I’ll be having a great time of it, even if no-one cares.


  1. First of all, you sound like a serial killer. But you’re not, since I’m not lucky enough to speak to one.

    Second, Latin, Spanish and French are Derivated from actual “true languages” such as Gothic, Galic, Venetian.

    Only some societies only frequented use of some words, depending on their trade or other interactions, and only adopted some words from those original languages.

    Also the accent shapes the evolution of learned words, creating an easier way for more people to understand their gibberish faster, and to recognise their environment very easily by how distorted their mouth spoutings sound like.

  2. Second, Latin, Spanish and French are Derivated from actual “true languages” such as Gothic, Galic, Venetian.

    Spanish and French are derived primarily from Latin. French has a strong Gaulish substrate, and Spanish has a hypothetical Iberian substrate, but since very little is known about the pre-Latin Iberian languages, we can only speculate about their influence on Spanish. I’m not sure what you mean when you say that Gothic, Gaelic, and Venetian are “actual true languages”. All of the languages mentioned above are real languages, and I never said otherwise. In any case, Gothic and Gaelic had relatively little influence on Spanish or French, and Venetian was displaced by Latin in classical times and had no influence on the other Romance languages.

    I really have no idea what your final two paragraphs are trying to say.

  3. Latin was a Script language, quite distinct from actual spoken languages such as Vulgar Latin. Which was probably more akin to jewish Ladino or the Latino dialects spoken in the Mediteranean or Latin Mohammerica.

    Gothic was a true spoken language, without a written form. So was Venetian, that was also preserved better because of it’s Etruscan Galo-celtic Script, which far predates Latin, here in Europe.

    Spanish and French do have some latino influences. But so does German, Russian, Polish, Greek and English. The Latin aspects only permeated the very illiterate and barbaric people of the meditereanean.

    The non-estruscan savages of Italy, were immigrants from distant outskirt provinces and colonies of the Roman Impyre. They did not understand the spoken languages of the european whites that were natives there. So they only learned the written or spoken latin from Rome and other such schools or academies for conversion. They were trained to be perfect roman slaves, known then as “middle caste”. They were more common even than the slave caste, beneath them. The middle ones were the true slaves and backbone of the Roman Goliath.

    The Goths were a much more warrior-like and feared people of Europe, that Wulfila only translated parts of the New Testament to their Gothic Bible, from greek or latin, to keep them from mastering their warrior ways, and destroying all of the roman empire itself. Which feared the Goths, and payed tribute to the tribes north of the Danube, to keep their lands “at peace”.

    The Gauls eventually separated from the mudblooded roman scum, who were accused of mixing with the freed slaves from distant colonies, and kept their white purity by retreating into the Galic empire, which retained some of it’s Etruscan influences and names, more than the Latin ones which became tainted.

    Iberian means “Iverni” or people from far away. The “Hiberni” or “Hiverni” were foreigners, immigrants of Jewish Sephardic descent. Possibly from near Israel, or most likely all of the Mediteranean reaches. They needed to invade “Spain” to flourish their jewish commerce with the Atlantic and Mediterean people. Spain has this jewish tradition even now. Madrid is the center of Demonic activities much worse than financial world domination. They keep this a secret, by using their puppets from outside Spain, but in reality Madrid (with Spain) is the Throne of the Devil. In more than one way.

    Gothic, Gaelic and Venetian were spoken before they were ever converted to writing. That’s why they are true Languages. A language is spoken, while scripts are written. Latin is purely a script, loosely based on Vulgar Latin, the spoken language itself.

    The above ones you mentioned are not true languages. They are crossbred words taken from different cultures, mostly. They are not more than 98% original, thus not “true languages”. They’re taught to their slaves by books, in school and society. It’s more a Foederati slave Dialect. Than any sort of true language. Your socio-historical understanding of Roman politics is found wanting.

    Well, no offence to your travel route, but I’ve Actually Been To Venice. You know that big port, on the Adriatic Sea? Yeah, I went there. As for Venetian, it’s known it had influences even from Galo-Celtic languages, thus making it closer to French, than latin, as well as Gothic influences, from the Visigoths of Wallachia, who conquered the western Roman Empire, including the italic peninsula and Alps.

    Latin did have some venetian influx from the local etruscans, as proven by roman writers and even the first roman imperator, who married an etruscan empress, and struggled to restore and preserve the etruscan language and it’s richer literature, to save it’s language from being lost.

    Venetian is more akin to Gaulic or Gothic, however. Than it is to latino / roman-latin. Feel free to read this right here, in case you doubt my skill:

    Make sure to real All of it, before you attempt to falsely contradict me again. You’re welcome.
    Venetian descends from Vulgar Latin, influenced by the Celts and possibly the Venetic substratum and by the languages of the Germanic tribes (Visigoths, Ostrogoths and Lombards) who invaded Italy in the 5th century.[citation needed] Venetian, as a known written language, is attested in the 13th century. We also find influences and parallelism with Greek and Albanian in words such as : piròn (fork), inpiràr (to fork), carega (chair) fanela (t-shirt).
    “Calle Berlendis” for instance, is not italic at all. It’s more romanian “cale”, with the exact same meaning. Than the roman “via”. So “calle” is possibly a celtic or gothic word originating in venetian or it’s outside influences in the 5th century, from the Gothic Empire.

    Venetian does sound a lot more Celtic (or Gaulic like french sometimes does), than Latino.

  4. Perhaps it would be more efficient for me not to read your links, and save both of us the trouble of having me contradict you again.

    In any case, you’re right about one thing: I had confused Venetian (the modern day Romance language) with Venetic (the extinct Italic language displaced by Latin).

    1. It’s only one link, and you couldn’t contradict me if you wanted to. Because I’m always right about these things.

      Venetian (or spoken Vulgar Latin) was heavily influenced by the Venetic substratum that the first roman imperator sought to preserve, with help from the first roman empress. Who was, in fact, Etruscan herself.

      The other influences from other spoken languages are Celtic, Gothic and Gaelic.

      Spoken languages influenced more than the written languages, which were mostly spoken by the lesser castes.

      Lesser castes of non-whites learned mostly latino dialects of written latin or written greek, whilst superior white people of Europe actually preffered to speak their native european languages. Like Gothic, Gaelic, Celtic, Venetic.

      You’re welcome.

      1. The exact relationship of Venetic to other Indo-European languages is still being investigated, but the majority of scholars agree that Venetic, aside from Liburnian, was closest to the Italic languages (a group that includes Latin, Oscan and Umbrian).

        Thus spake Wikipedia. Though I suppose I should assume that you know better than Wikipedia.

    2. ~~~~~~~~~~~~

      The writing system had two historical phases: the archaic from the 7th to 5th century BC, which used the early Greek alphabet, and the later from the 4th to 1st century BC, which modified some of the letters. In the later period, syncopation increased.

      The alphabet went on in modified form after the language disappeared. In addition to being the source of the Roman alphabet, it has been suggested that it passed northward into Venetic and from there through Raetia into the Germanic lands, where it became the Futhark, a system of runes.

      Yeah, no. Actually it’s runes were used from Switzerland northwards, where the Norse used it as germanic runes. If you’re going to tell me germanic runes are 100% Italic languages, excuse me whilst I laugh myself into a coma.

      You didn’t read the full extent of their research, that Wikipedia details. As far as the details go, Latin only used it’s Venetic alphabet, because of the emperor Claudius one assumes?

      They barely kept a dozen, at best, etruscan languages from Venetic. Only the alphabet or written script, which they adapted into what you know as latin. Which was a fabulation and not an actual spoken language.

      Sorry you are so confused, but when I see these linguistic misinterpretations, my lie detector just goes off the mark.

  5. Venetic wasn’t an Italic language at all. It was Etruscan or something else of that calibre.

    Venetian has some romance and italic influences yes, but not the original Venetic, which was purely pre-latin.

    So were Gothic and Gaelic. Purely spoken languages that predate Latin and Greek, which were actually more religious scripts, than actual wide spread languages.

    Languages are actually spoken instead of written. Scripts are actually written more than they are spoken. I would expect myself to have missed this detail instead of you, since I am not the english speaker here.

  6. Common sense dictates you should. Wikipedia considers itself not a reliable source, just a general source that mentions some accepted theories. Even fringe theories such as “Venetic = Italic” are mentioned, as is your unfounded belief.

    Now, unto business.
    “The Etruscan language was spoken and written by the Etruscan civilization, in what is present-day Italy, in the ancient region of Etruria (modern Tuscany plus western Umbria and northern Latium) and in parts of Lombardy, Veneto, and Emilia-Romagna (where the Etruscans were displaced by Gauls). Etruscan was superseded completely by Latin, leaving only a few documents and some loanwords in Latin, such as persona (from Etruscan φersu), and some place-names, such as Roma.”
    If only Roma and persu are known to have been kept in Latin, how could it possibly be an italic language? There is absolutely no chance of that happening, since only two etruscan words are known to have remained within Latin.
    Spare me your ridiculous pseudoscience, where anonimous so called researchers like to boast their own italic origins by claiming Etruscan as “just another italic language of their So called Great Ancestors”.

    You confuse italian nationalism with it’s distorsion of actual European history, with true language studies, which are coldly accurate and historically ruthless.

    Etruscan literacy was widespread over the Mediterranean shores, as evidenced by about 13,000 inscriptions (dedications, epitaphs, etc.), most fairly short, but some of considerable length.[1] They date from about 700 BC.[2]

    The Etruscans had a rich literature, as noted by Latin authors. Unfortunately, only one book (now unreadable) has survived. By AD 100, Etruscan had been replaced by Latin.

    Only a few educated Romans with antiquarian interests, such as Varro, could read Etruscan. The last person known to have been able to read Etruscan was the Roman emperor Claudius (10 BC – AD 54), the author of a treatise in twenty volumes on the Etruscans, Tyrrenikà (now lost), who compiled a dictionary (also lost) by interviewing the last few elderly rustics who still spoke the language. Urgulanilla, the emperor’s first wife, was Etruscan.

    Livy and Cicero were both aware that highly specialized Etruscan religious rites were codified in several sets of books written in Etruscan under the generic Latin title Etrusca Disciplina. The Libri Haruspicini dealt with divination from the entrails of the sacrificed animal, while the Libri Fulgurales expounded the art of divination by observing lightning. A third set, the Libri Rituales, might have provided a key to Etruscan civilization: its wider scope embraced Etruscan standards of social and political life as well as ritual practices. According to the 4th century Latin writer Servius, a fourth set of Etruscan books existed, dealing with animal gods, but it is unlikely that any scholar living in the 4th century could have read Etruscan. The single extant Etruscan book, Liber Linteus, which was written on linen, survived only because it was used as mummy wrappings.

    Etruscan had some influence on Latin, as a few dozen Etruscan words were borrowed by the Romans, some of which remain in modern languages.

    You’ll find the rest of that up there. If you say Wikipedia doesn’t lie, guess what. It also says only two words survived in latin, and only 12 words recognised from the Liber Linteus were partly kept modified in other modern languages. That’s hardly evidence to call it “Italic”.

    Unless you’re being fairly naive as to misinterpret their actual meaning. Italic languages as in purely local dialects spoken mostly in or around Italy. The geographic location being “italic” I do not dispute. The italic linguistic relation to most italian or italic languages however, is purely ridiculous and unfounded.

  7. A simple continuation to your general statement. “It’s relation to indo-european branches is being investigated”.

    The Etruscan language has been difficult to analyze, due to its being an isolate. Bonfante, a leading scholar in the field, says “… it resembles no other language in Europe or elsewhere ….”[1] The ancients were aware that Etruscan was an isolate. In the 1st century BC, the Greek historian Dionysius of Halicarnassus stated that the Etruscan language was unlike any other.[5]

    The phonology is known through the alternation of Greek and Etruscan letters in some inscriptions (for example, the Iguvine Tables), and many individual words are known through loans into or from Greek and Latin, as well as explanations of Etruscan words by ancient authors. A few concepts of word formation have been formulated (see below). Modern knowledge of the language is incomplete.
    [edit] Tyrsenian family
    Main article: Tyrsenian languages

    The majority consensus is that Etruscan is related only to other members of what is called the Tyrsenian language family which is an isolate family, that is, unrelated to other language groups by any known relationship (see Language isolate). Since Rix (1998), it is widely accepted that Tyrsenian is composed of Rhaetic and Lemnian together with Etruscan.

    Another Aegean language which is possibly related to Etruscan is Minoan. The idea of a relation between the language of the Aegean Linear scripts was taken into consideration as the main hypothesis by Michael Ventris before discovering that in fact the language behind the more modern Linear B script was Mycenean, a Greek dialect. Giulio Mauro Facchetti, a researcher who has dealt with both Etruscan and Minoan put forward again this hypothesis, comparing some of the Minoan words of known meaning with some similar Etruscan words.

    According to Woudhuizen, the Etruscans were colonizing the Latins. The Etruscans brought the alphabet from Anatolia.

    More recently, Robert S.P. Beekes presented a similar case, but argued that the people later known as the Lydians and Etruscans had originally lived in northwest Anatolia, with a coastline to the Sea of Marmara, whence they were driven by the Phrygians c. 1200 BC, leaving a remnant known in antiquity as the Tyrsenoi. A segment of this people moved south-west to Lydia, becoming known as the Lydians, while others sailed away to take refuge in Italy, where they became known as Etruscans. The Etruscan language could therefore have been related to a non-Indo-European substratum of Lydian.


    The Latin alphabet owes its existence to the Etruscan writing system, which was adapted for Latin in the form of the Old Italic alphabet. The Etruscan alphabet[22] employs a Euboean variant[23] of the Greek alphabet using the letter digamma and was in all probability transmitted through Pithecusae and Cumae, two Euboean settlements in southern Italy. This system is ultimately derived from West Semitic scripts.

    The Etruscans recognized a 26-letter alphabet, which they used in and of itself for decoration on some objects such as the “rooster ink-stand”.[24] This has been termed the model alphabet.[25] They did not use four letters of it, mainly because Etruscan had no voiced stops, b, d and g, and also no o. They innovated one letter for f.


    Sorry for copy pasting so much of it here. But from all of this, it appears there is speculation of Etruscan being “vaguely, possibly, distantly related” to many Indo-European languages such as Armenian, Albanian and several others. But no actual breakthrough to prove this. The best one is the Trojan / Anatolian link.

    So, no. If it cannot be proven to have direct exclusive connections to specific indo-european languages, just cultural misshaps and crossings here and there, it’s definately not indo-european.

    Their 26 runes were, however, the script core of the later Latin alphabet. Which was inspired from the Etruscan’s richer literature, yes. So Latin was not original at all, as you assumed.

    If you have further confusions I can adress, don’t hesitate to name them. I will clarify them as best as I can.

  8. I can’t believe you left the first and foremost “common failure mode of language in SF” unsaid. Namely, every single mention of “guttural”. Yes, yes I know it means “pertaining to throat”. Nevertheless, it means nothing in real. I had no mention of guttural in my studies on linguistics and phonetics. And the SF books that use guttural just make everything about language up. I scream every time guttural is used in SF, and that means in _every_ _fucking_ book.

    1. “Guttural” is used as a cover term in some textbooks for uvular and pharyngeal sounds, so it’s not completely without merit. However, most of the people describing Ye Olde Evil Race’s language as “guttural” wouldn’t know a uvular fricative if it kicked them in the glottis, so, yeah, you’re mostly on target.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s