And they shall be at liberty to keep festivals and make rejoicings

… says the decree that Ptolemy V issued in 196 BC, at the time of his accession to the Egyptian throne. He – or the people who erected the stele with this text – probably didn’t know what joy they had actually given to later generations: first, to Jean-François Champollion; second, to historians who could finally understand ancient Egyptian scripts and unravel Egyptian history; third, to language technicians who found yet another historic item that they could use as legacy and name their products after.

Maybe the last part is a bit too sarcastic because the Rosetta stone and its likes hold real value for all these people – I mean, beyond the symbolic significance. In fact, the Rosetta stone is not the most important or the best-preserved specimen of its kind (see another example here) – but it had been discovered first, which made it the primary vehicle of deciphering the Egyptian scripts.

The Rosetta stone is an ingenious cultural artifact, although it was made exclusively for the purposes of language policy; and its creators probably didn’t know how similar text arrangements would be used twenty-two centuries later in translation and translation technology. Namely, it contains the text of the same decree in two languages and three scripts: in Ancient Egyptian hieroglyphs, in Egyptian demotic script – and in Ancient Greek. Ptolemy V, his predecessors and successors were Hellenic rulers, using Greek for official affairs. His Egyptian subjects spoke Egyptian, a once-great language of an once-great empire. This language is not related to Greek and it’s quite difficult to learn for speakers of the other language.

At the time, both languages enjoyed a high status. Greek represented a typical integrating culture (that’s why we know a lot of ancient people, institutions, cities and other things by their Greek names). But the Hellenistic elite – where Ptolemy V belonged – wasn’t coming in large numbers. As a result, Greek never became a vernacular tongue in Egypt. Greek traders, venturing deep into foreign lands (including Egypt) made a much more lasting impression on the host culture, but that was the sort of infiltration that does not threaten the status of the majority language of the area. Egyptian thus remained in use even among the elite, albeit only for religious purposes after the Hellenistic conquest.

Ironically, both languages succumbed to the Islamic conquest in the 7th century AD, when Arabic and the culture it carried became dominant in the area. Egyptian survived in the form of Coptic, spoken to this day by Christians in Egypt. Greek remained the language of administration in the Byzantine state, which gradually diminished in the Middle Ages. Finally, after Constantinople fell in 1453 to the Ottoman Turks, and was renamed Istanbul, Greek was degraded into the language of a conquered people. (Of the two events, Egypt’s Arabic conquest was probably the greater blow to the Greek-Roman world because Egypt was a primary source of food for Europe in those times.)

The summary of the fate of the Ancient Greek and Egyptian languages is based on Ostler (2005).

But because it was deeply embedded in the Roman tradition, the knowledge of ancient Greek has prevailed among European scholars to this day. This is how Jean-François Champollion – and his competitors – could use the Greek text as a starting point to interpret the Egyptian texts. The Rosetta stone was rediscovered in 1799, and Champollion worked on it in the early 1820s. (During the Napoleonic wars, the stone had been captured by the British, and they took it to the British Museum – allegedly, Champollion had to work hard to get at it.)

Let’s look at what they did – through the eyes of language technology: because Champollion and Thomas Young, originally a medical (!) doctor, applied methods that are very similar to the ones we use on computers when we try learning about human language from large text bodies, also known as corpora (corpus in singular).

The contents of the Rosetta stone classify as a multilingual document because they represent the same meaning in three different encodings (but only two different languages, as it turned out later). One of these encodings – the Greek text – is well known, and the other two need deciphering. Initially, we don’t even know if the other two are written in the same language. It was something Champollion proved over the course of his research.

To be entirely precise, at the beginning we don’t even know for certain if the unknown writings hold the same text. But to all who looked at the Rosetta stone, it seemed likely, and they assumed that all three inscriptions indeed depict the same text. This was what experimental science calls a hypothesis.

If we want to learn the unknown language (or the two unknown languages?) from the known one, we must find what parts of the unknown text correspond to each part of the one we understand. In corpus linguistics, the study that aims at learning from text corpora, we call this procedure the alignment of the two (or three) texts.

When you try aligning two texts, and one of them is completely unfamiliar, you might want to start from anchors, units that probably represent the same words or expressions in both texts. The best candidates for such units are proper nouns (or ‘names’; generalized in natural language processing as named entities) such as ‘Ptolemy’. Another possible approach is to count words. You find words in the known text that occurs more than once, count the occurrences, and count the repetition of words in the unknown text, too. If you find a sequence in the unknown text that occurs in the same number (in the same frequency as corpus linguists call this), you have reason to believe that the two units – the word in the known text and the character sequence in the unknown one – represent the same meaning (lexeme as linguistics calls it). Of course, this is not always true: nevertheless, it’s an assumption that computer programs often make when they perform word-level alignment of two texts.

Such was the approach of Thomas Young, the Brit who had studied the Rosetta stone shortly before Champollion. He was able to identify connections between the texts, but he was unable to decipher Egyptian characters, let alone words.

It was Champollion’s work that led to the breakthrough. He was more systematic – scientific – in his method: he had made two crucial assumptions (he could not be certain of either of them), and then he set out to prove them. His first hypothesis was that the hieroglyphic and the demotic texts are the same, but in two different writings: in other words, the hieroglyphic and demotic writings were equivalent. The second assumption was that both writings where phonetic (when each character stands for a sound) rather than logographic (when each character stands for a word, a concept, or a syllable).

At the time, it seemed a far-fetched idea to think hieroglyphs represent sounds (linguists say phonemes) rather than words, simply because of their elaborate imagery. Now we know they represented both: hieroglyphs could be logograms, and there are writings where they are; but each hieroglyph also had a phonetic value, and then they were also used very much like we use letters today.

Even Champollion started out believing that hieroglyphs (and consequently, demotic characters) were logograms, each character corresponding to a concept or a word. He tested this by counting the characters in the Greek text and in the two other writings. Much to his surprise, the number of characters in the Egyptian texts was much closer to the Greek character count than he expected. Then he thought it was unrealistic to believe that the Egyptian characters were logograms. Now he knew – or at least he had very good reason to believe – that the Egyptian writing is phonetic, and he will need to identify sounds (phonemes) rather than words. Which is unfathomably easier because there are far fewer different sounds than words.

He also proved that the two Egyptian writings are equivalent: he used a statistical method to compare the characters in the hieroglyphic and the demotic writings, and developed a method to transcribe one into the other.

After learning all this, he could then use the anchors (which were also identified by Thomas Young) to decipher each character. Why? Because in case of proper names, he had reason to assume that the Egyptian form is a phonetic transcription of the Greek name (or vice versa). In fact, the Egyptian writings (both of them) were very similar to modern Arabic or Hebrew from one aspect: they represented the consonants only. Vowels had to be inferred from phonetic transcriptions in other languages (mainly Greek), or contemporary descriptions of the language.

Because of this latter procedure, the Brits accused Champollion of plagiarizing Young’s work – which, on the one hand, was not true, and on the other hand, Champollion got much farther than Young.

The Rosetta stone contained too little text for Champollion to decipher all the characters, not mentioning all known words. Thus he started using other similar artifacts (other multilingual stelae) and other texts to expand the corpus he was worked on. In the end, he managed to produce an Egyptian dictionary. By aligning the different representations of the same text, Champollion created a parallel corpus, a very modern vehicle to learn about a language from another, and to learn about translation. He would probably be thrilled to see what computers can do with these resources today.

I’ve read about Champollion’s method in Weissbach (1999). Parallels with natural language processing are from myself. I’ve also deliberately omitted Wikipedia references for terms in italics.

Champollion’s – and, to an extent, Young’s – research has immense historical significance for modern corpus linguistics, precisely because they had almost zero initial knowledge about the text they set out to decipher. And when you need to describe something – a piece of information or a procedure – to a computer, you must not assume any initial knowledge because the computer has none. The research on the Rosetta stone shows how far one can get by processing text with statistical methods.

(And no, the significance of the Rosetta stone is not that a distant relative of Champollion started creating computer-assisted translation software in the 1990s – it’s an interesting coincidence, no more.)

By the way, the mathematical procedure to learn about an unknown thing from other – known – information is called induction, which will be the topic of the next post.

The ancient Egyptian world, Ptolemy’s Hellenistic realm, the Napoleonic wars, Champollion’s studies and modern language technology are all connected now. We are told – we experience – that we can move in time in one direction only. But then the makers of the Rosetta stone, over two millennia ago, could influence how we look at things today – in ways they probably never imagined. And in a way, this affects their time, too, because the significance of their achievement is quite different from what they originally intended. One can say it’s an example of the butterfly effect, but to me, right now, it’s more like the time arch in David Mitchell’s ‘Cloud Atlas’.

Non-Wikipedia references

Ostler, Nicholas (2005): Empires of the Word – A Language History of the World. Kindle edition. Sections ‘Coping with Invasions: Egyptian undercut’ in Chapter 4 and ‘Syria, Palestine, Egypt’ in Chapter 6.

Weissbach, Muriel Mirak (1999): How Champollion Deciphered the Rosetta Stone. Originally in Fidelio, Vol. VIII, No, 3. Fall 1999. Available at time of writing at

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s