Broken English

Professor Greg Kondrak stands in front of his second-year computational cryptography class at the University of Alberta. The computational linguist and computer scientist is instructing 40 students on ciphering, that millennia-old science and art of coding and decoding messages. He speaks in a Polish accented monotone as he explains Caesar ciphers, a simple example in which one letter of the alphabet is replaced with another. On the screen behind him he pulls up a picture of two rotating discs with the letters A to Z on them. Spin one disc four spots to the right and “A” becomes “E,” and so on.

“Caesar ciphers worked well 2,000 years ago,” Kondrak explains, “but it’s child’s play now.” He asks if anyone has a question. Crickets. He continues, unsurprised and undeterred, to explain transposition and multiplicative ciphers, the more modern kinds that allow each of us to perform such mundane tasks as online banking without a second thought.

But Kondrak’s 50-minute lesson ventures far beyond cryptography. He puts computer science in context with references to mathematics and Google Translate, to artificial intelligence and machine learning, to Greek mathematician Euclid and to the differences between Slavic, Germanic and Romantic languages.

And therein lies Kondrak’strue passion — language. He speaks one of each of those linguistic subfamilies. Polish (Slavic) is his native language, English (Germanic) is his working language and Spanish (Romantic) is his family language because his kids are Mexican on their mother’s side. He’s also versed in French, German and Esperanto.

He’s particularly attracted to phonology, the subset of linguistics that asks why words are pronounced as they are, a fascination that he traces, in part, to learning English as an adult. “The teacher would write words and then pronounce them,” he says. “I would ask, ‘Why would you write it like this if you pronounce it like that?’” Think write, right and rite. A satisfactory answer was not forthcoming.

Kondrak is tall and slim with a dishevelled mop of brown hair. He came to Canada 30 years ago after completing his undergraduate degree in computer science at the University of Warsaw. He worked in industry for a couple of years but realized that wasn’t for him, so returned to the University of Toronto for graduate work, including his PhD. He wrote his thesis on cognates, words that are similar across different languages. He developed an algorithm to compare words according to how they sound when spoken, which can then be used to compare how similar they are. For him, it was pure research, but it proved to have practical purposes. When the U.S. Food & Drug Administration was asked to approve the name of a new drug — a name that often sounds like plenty of other drug names — Kondrak’s algorithm assisted in deciding if it was too similar to something already out there, helping ensure that prescriptions were filled correctly.

Kondrak combines his passion for language with computer science — as part of the University of Alberta’s world-renowned artificial intelligence cluster — by focusing his research on natural language processing. NLP is the branch of artificial intelligence that deals with the interaction of computers and languages — think Alexa or Siri, times 1,000. In particular, NLP is the attempt to make computers respond like humans, taking them beyond merely giving you today’s weather forecast, last night’s hockey score or Cardi B’s latest hit. “It’s a core area of AI,” Kondrak says. “If you write programs that act intelligently, that’s AI. Playing chess is one example. Understanding language is another.”

To further illustrate what is and what is not AI, Kondrak refers to the Turing test, named after the brilliant British mathematician and computer scientist Alan Turing who, in 1950, said we’ll have AI when you can sit at a keyboard and interact with another agent by typing and reading text without being able to tell if that agent is human or not. “We have not passed that test,” he says. “There are still some fairly simple ways to figure out when we’re talking to a bot.”

Needless to say, two Hollywood movies in which humans fall in love with AI, Her and Blade Runner, come up in conversation. In Blade Runner, based on Philip K. Dick’s Do Androids Dream of Electric Sheep?, Harrison Ford’s character is a police officer trained to identify AI, which has become so human that it’s impossible for the average citizen to tell it apart from actual people. Ford ends up in love with one AI character and they drive off into the sunset. ”I love that movie,” Kondrak says. “We are definitely going to get to that point, and people have started considering the ethical problems that come along with AI. If it’s a robot, can you just do anything to it and it doesn’t matter?”

Kondrak’s research has made it into the broader public consciousness a couple of times. The first time, in 2016, he published a paper that mathematically proved that English is a difficult language to speak and spell, directly contradicting a book-length treatise by the world’s most famous linguist, Noam Chomsky.

Then, in 2018, he suffered a week-long bout of notoriety when a paper he had published on the Voynich Manuscript made it into the news cycle. The Voynich Manuscript is an elaborately illustrated, 600-year-old book written in an unknown script and language and then, apparently, encoded. Kondrak and one of his grad students used AI to try and figure out the language in which it might have first been written, and ended up settling on Hebrew. Journalists from around the world began calling to discuss this breakthrough, which Kondrak insists was not the breakthrough they might have thought. “It was completely unexpected,” he says. “We had a project, we wrote a paper, we published a paper. Nobody paid any attention to it, but then at some point we get this explosion of interest which lasted about seven days and now again it is nothing.”

He insists he never claimed to have deciphered the manuscript, only that his research found that Hebrew was “a very good candidate” for the original language. “‘Please look at the paper,’ I told them, but of course nobody looks at the paper,” he says. “I’m glad it died down because it was a big distraction.”

Kondrak continues his work on NLP with a project on the meaning of words. While a human may never confuse “bank” — financial institution or river’s edge — in a sentence, a computer will. His work puts words in context and will help take virtual assistants from recipe finders to AI. “People want to talk about something and get some kind of feedback that indicates the computer actually understands,” he says. “It’s not just factual information. It’s indicating that you understand what people are saying, which brings us pretty close to the Turing test.”

Next comes the falling in love.

This article appears in the June 2019 issue of Avenue Edmonton