There is general agreement that language is the key to AI, but who holds the key to language – algorithms or people?
According to the hype, the key to automated natural language understanding lies in vast data collections containing millions of words coupled with machine learning algorithms. If you believe this version of the story, such algorithms can automatically learn the intricacies of language well enough to, if not write the next Man Booker Prize winner, at least provide a decent natural language interface to a computer app.
However, this is a truth with substantial modification, adjustment, and a certain amount of refitting. Successful human-like natural language understanding must encompass more than just words and algorithms. Language understanding involves words, and sentences, as well as social and dialogue context. This means that most companies that need a high-quality natural language understanding system might be best served by a rule-based platform, especially if they do not possess the substantial amounts of data required by statistical systems.
So why the algorithm-hype? Perhaps because statistical algorithms are supremely useful for some purposes, such as aiding and guiding analysis of big collections of language data. And for some applications, neural network algorithms deliver very impressive results. Such algorithms have vastly improved speech recognition systems, the technology for mapping sound waves to text characters, which is the first step in processing speech.
But what about the crucial next steps: actually making sense of the words, phrases, and sentences in a whole dialogue? After all, labels such as “neural network” and “deep learning” in the context of speech recognition give the appearance of something much more ambitious than converting sound waves to text: a human-like ability for learning to master the deeper aspects of language directly from language data. We do not know what future research will bring, but at the moment this is still science fiction.
The challenge with language is also what most people find so fascinating: its staggering diversity, which shows up where you would least expect it. According to the Oxford Dictionaries blog there are 22 different ways of saying “yes” in English. The ways of expressing this basic meaning spans a surprising amount of variation, from the mundane “OK”, to the archaic “yea” and the arch-British “righto”, to linguistic rarities like “10-4” and “fo’shizzle”. However, we seldom communicate in single words, and the real challenge for automated systems lies not in the words specifically but how they combine to form meanings, as well as their dependence on context.
The missing piece of the equation is the human factor. The aim of automated natural language understanding is to approximate human handling of language in dialogues. So where do humans get this ability from? Humans are preconditioned to learn language, but the process involves more than simply applying an algorithm to some data. Although children have an innate knack for learning languages, the key to language learning is family.
Language learning and understanding are contextual, which is why children grow up speaking and understanding the language spoken in their surroundings. Also, learning a language takes time. Over a period of several years, from birth to school age, children are exposed to millions of words directed at them every year. We know that this matters because children’s vocabularies depend on the number of words their carers use when they talk with them. These are of course not random words, but phrases and sentences presented in a social, communicative context.
Just like a child learning a language, an artificial system for natural language understanding needs human supervision. Even a statistical algorithm that learns from data can only do so from structured training data carefully curated by humans for some specific purpose. In short, if you want human-like language abilities, you need humans, because humans are indispensable for natural language understanding systems, whether statistical or rule-based.
Humans are required for selecting and curating data before an algorithm can be effective. Humans must evaluate the results and make sure the system takes context and company business rules properly into account. Only then is it possible to deliver truly human-like understanding.
Nevertheless, the similarities between statistical and rule-based systems should not be overstated. Statistical systems require large amounts of data, which helps explain why so many tech giants have been encouraging customers and users to interact via text and voice over their systems. However, using only data collections and machine learning algorithms does not always yield expected results, something Microsoft and Facebook have already discovered.
Taking a hybrid approach of using both a rule-based algorithm created by expert humans and statistical algorithms where appropriate, gives a number of key advantages over purely statistical systems. Building such hybrid systems requires less data and might well take less time. The choice of development tools can also make a big difference to the final result. Some natural language development platforms include not only the development tools themselves, but also curated data resources and the tools for expanding them. With a rule-based algorithm, coupled with machine learning algorithms, curated data and a development platform with a sophisticated graphical user interface, humans can easily construct the intelligence behind human-machine conversations to ensure that natural language applications properly understand the context of the conversation – every time.
Like this article? Subscribe to our weekly newsletter to never miss out!