Roughly six weeks ago, I went to see *The Imitation Game* – I caught one of the last English-language screenings in my city. Opinions might vary about this movie, but Alan Turing’s attitude, as shown in the film, reflected the mindset of a true programmer. True programmers, when they face a specific problem, tend to go one abstraction level up, and create a solution not just for the problem at hand, but for an entire class of similar problems. In fact, this is the very attitude that gave us language technology.

I know what I’m talking about: I had the same experience. Back in the early 2000s, a friend of mine and I were working for the same company, specializing in natural language processing. As it happened, we two found ourselves in competition of writing a rule-based machine translation engine. The job was about a very specific language pair: English into Hungarian (which is fiendishly difficult). By coincidence, both of us came up with a highly language-independent solution rather than tailoring the algorithms to the language pair at hand. This means that our actual program code was written so that it took the respective grammars of both languages as an input: they weren’t hard-wired into the code. We worked out a company standard for writing grammars, and both our programs could take and interpret grammars written according to this standard. In other words: in order to use the linguistic module by either of us two, first you had to describe (program, if you will) the language that the programs were to understand, that is, analyze or translate.

*At some point, I will have to clarify the differences between language technology, computational linguistics, and natural language processing. It will be done soon, but till then, please accept that these three terms refer to three slightly different concepts.*

My friend won the competition – he had way more programming talent (as it turned out over the years) and more time. But as we kept consulting on this job, we had discovered that our ways of thinking were so similar that, a few years later, we ended up founding a company together. (Okay, this post is getting too personal, so I’d better get back to the point.)

Human beings seem to have a drive for a universal – and rational – understanding of the world. The pursuit of such total understanding is very manifest in early Greek philosophy – look at the works of Heraclitus, Democritus, Euclid, Plato, and Aristotle, just to name a few representatives.

Of these illustrious men, I need Euclid and Aristotle for today. Euclid worked out a comprehensive system of geometry – geometry being the mathematics of how things are physically laid out in the world. In turn, Aristotle created a system of logic: a way of reasoning that could eventually be described using mathematical symbols.

On a side note: along with his logic, Aristotle also created a very formal way of describing, or *defining*, things. We largely owe him the structure of *definitions* we find in dictionaries and terminology today. That alone will merit another post, so I’ll remain silent about it for now.

In medieval Europe, classical Greek philosophy was forgotten for a good many centuries. Then it was rediscovered as the era of the Renaissance emerged. The credit of rediscovering Aristotle goes largely to Thomas Aquinas (though he, unfortunately, didn’t fail to uncover the destructive element – the systemic but desultory treatment of women – in Aristotle’s work). As a result, in the centuries coming, the knowledge of classical philosophy became the bread and butter of just about every intellectual in Europe.

One of the brightest figures of that era was René Descartes, who actually created a mathematical field called analytic geometry, which helps to describe the physical layout of things in numbers. He also introduced a very pure rationalism, claiming that reason is the sole source of knowledge, rather than sensory observation. This thought actually raised the profile of mathematics because that is the study where we discover new knowledge from existing knowledge in a very systematic way.

In fact, in its purest form, this is not entirely true: our observations and measurements in nature contribute to changes in mathematical systems. Think of just two numbers: π (pi = 3.1415926535…) and *e *(=2,7182818284…). Both are *irrational numbers*, which means that neither are they integers nor can they be expressed as an integer divided by another. You cannot reason out an* irrational* number, it must come from some kind of observation. The π is the circumference of a circle divided by its diameter. The *e, or Euler’s number* was first discovered (by mathematician Jacob Bernoulli) as the ultimate multiplier of the *compound interest* of 1*.* So, both numbers come from nature or social interaction, rather than pure reason.

Still, both numbers were drawn into the rationalist structure of *algebra*: it’s as simple as expressing them in mathematical symbols, and proving *by reason *that the symbols actually result in the same thing – rather than listing all digits, which cannot be completed in a finite period of time anyway. And, by its very nature, rationalist thought has been holding its ground in mathematics ever since. (In the past, people even tried to express the π as a rational number, by approximating it with 22/7. Compute that to find out how close the approximation is.)

In the late 19^{th} century, mathematicians became preoccupied with the *validity* of their systems. Their questions concerned the boundaries of knowledge they can attain from the existing algebraic structure.

This resulted in an even more furious quest for universal understanding. In 1900, quite symbolically, David Hilbert published a set of problems also known as Hilbert’s program. Crudely simplified, this program, especially its second problem, was no less than a completeness hypothesis, which was meant to resolve a prevailing foundational crisis of mathematics.

A foundational crisis means that you always encounter paradoxes, contradictions and inconsistencies when you try to find a deeper meaning to your profession, mathematics in our case. It’s quite a crisis because you can no longer know what is true and what is not.

Hilbert was trying to find a clear-cut set of *axioms* – inherently true statements that don’t require proof – that can serve as the basis of every proof of every *theorem* (statement) in mathematics. Moreover, Hilbert hypothesized that there would be no *ignorabimus*, that is, any statements that we cannot know, or *prove*, to be either true or false. This was rationalism *par excellence*: if all truths can be proved, *deduced*, from a closed set of axioms, then one can come to all new truths by pure reasoning.

In setting up his framework, Hilbert was heavily relying on other, rapidly evolving fields of mathematics such as mathematical logic, number theory and set theory, with George Boole, Augustus De Morgan, Gottlob Frege, or Georg Cantor, to name some of the greatest practitioners. In turn, their works can all be traced back to the rationalism of Descartes and Aristotle’s doctrine of logic.

*Confession follows: I was never a math geek, forsooth – I was using mathematics as a means of becoming a software geek rather. Few fields in mathematics fascinated me – but at the beginning of my engineering studies, Cantor’s infinite set theory, as introduced in our first-year discrete mathematics class, was one of them. That there is not one infinite but several different infinites, is definitely goosebumps-worthy.*

Over the first three decades of the 20^{th} century, Hilbert’s program kept mathematicians busy. Their preoccupation was quite justified – to see clearly in all mathematics, Hilbert’s second hypothesis of axiomatic completeness had to be proved or disproved.

To tackle Hilbert’s program, mathematicians had to learn a lot about mathematical proofs. Alan Turing, working on Hilbert’s problems himself, came to an ingenious approach when trying to formalize them. He translated all mathematical symbols into numbers, and constructed a theoretical machine that could perform a restricted set of operations on a ribbon of numbers. Turing managed to prove that every mathematical proof can be translated into the ribbon of numbers and a sequence of operations on these numbers. The resulting study is probably Turing’s most important work. Called *‘On Computable Numbers, with an Application to the Entscheidungsproblem’*, or *‘Computable Numbers’* in short, it proves that every mathematical statement can be represented by a sequence of numbers.

Turing’s approach does *not* prove or disprove Hilbert’s hypothesis, but it gives mathematicians and engineers an immensely valuable device, which, once physically implemented, can *automate* abstract flows of reasoning, that is, mathematical proofs. In fact, *‘Computable Numbers’* is one of the fundamental works that lead to universal automata, that is, modern computers, where the *program* of the automaton is represented with numbers – just like the data the program is working on. The direct precursor of the stored-program computer, this principle is there in every digital appliance we encounter today.

In short, Turing didn’t put an end to the doubts lingering over the world of mathematics. Just like he didn’t break any particular codes *himself* in the Enigma project – but he constructed a machine that could. Remember that scene from *The Imitation Game* where he is berated by fellow code-breakers for fidgeting instead of deciphering more messages every day?

The completeness dilemma was finally resolved by Kurt Gödel. Simplified to the extreme, his incompleteness theorems say that a consistent system of axioms [basic truths] and theorems [provable statements] cannot be complete, that is, in any system, there will always be statements that are true but cannot be proved.

Gödel proved his incompleteness theorems at the age of 25, using a method similar to Turing’s: he introduced the Gödel numbering, representing formal expressions with natural numbers. In my interpretation, this means that pure rationalism doesn’t work. It’s not possible to come to every truth by pure reasoning: a system of knowledge needs to *import* observable, or *measurable*, truths, to extend the range of knowable–provable statements, that is, to *learn* more. This was tackled by scientific philosophers with way more competence, so I’d better stop here.

Modern scientific methods integrate rigorous observation, or *measurement*, with systems of reasoning, forming a reliable framework to learn more and more truths about the world. In this post, we’ve seen how the quest for these methods also gave us the universal automaton – the computer.

But what has this to do with language? In fact, it’s quite simple: linguistics, originally member of the circle of *arts*, was invited to the party of experimental studies – and then language began to be studied just like science. The next post tells the story of how mathematical abstraction was introduced to the study of language (mostly thanks to Noam Chomsky), and how the experimental study of language emerged in the second half of the 20^{th} century.

But linguistics, when studied like science, will encounter the same sort of limitations, maybe even barriers, that science has – which follows from Gödel’s incompleteness theorems and their extensions. I will attempt to show, in a very simplistic way, how it affects modern mathematical linguistic models, and their most popular application, machine translation.

**Disclaimer:** When I talk about limitations of science, when I criticize (pure) rationalism, I do not endorse any of the anti-science views that have been gaining popularity lately. On the contrary: I am still convinced that science and the scientific method is, and should remain, the primary means of learning about the physical world. Ultimately, there might be things we cannot learn – but the worldly things we can learn, we can learn through science only.

## 2 thoughts on “Universal Understanding, Universal Machine”