Natural Language Part 3- Voice Recognition Systems

Artificial intelligence is a somewhat older discipline than some people realize. The face that hi tech presents to the world is changing ever more rapidly, and this continues to be true even with the current economic slowdown in hi tech. In this environment of rapid change, it is sometimes easy to forget that the foundations upon which hi tech is based have often been around for a much longer period of time than the latest "hot" technology. Work on artificial intelligence dates back at least to the 1950's (the days of the earliest fully electronic computers) with the pioneering work of McCarthy, Minsky, Simon, Newell, and others. Arguably, it dates back further to Alan Turing's work in the 1930's and 1940's on designing the "Turing machine" and the "Turing test", and even to the pioneering work of Lady Ada Lovelace and Charles Babbage in the 19th century. To understand the current state of the art, and where artificial intelligence is heading, it is often necessary to understand the past as well.

This is definitely true in the case of voice recognition systems-the art of designing software that can understand the human voice and can convert from the human voice to text representation. Most voice recognition software today is based upon a number of research projects which were carried out in the 1970's under the auspices of the Defense Advanced Research Projects Agency (DARPA). This initiative, called SUR (Speech Understanding Research), established a goal of building a computer system which could understand a vocabulary of about 1,000 words spoken by a speaker in a room without a lot of background noise. SUR led to the development of a number of speech understanding systems such as HEARSAY and HARPY. It is debatable whether these systems actually met the original benchmarks for the projects, but they established some basic methodologies for speech understanding systems which underlie today's speech understanding software. Speech understanding software today is definitely capable of meeting those benchmarks.

Speech understanding systems divide spoken language into a number of basic building blocks called "phonemes". A "phoneme" is the smallest segment into which spoken language can be meaningfully divided. A phoneme might roughly correspond to a letter in a word, but of course spoken language doesn't directly correspond to the letters in written text. Once the spoken language has been broken up into the component phonemes, it is the job of the speech understanding software to build it up again into larger units such as morphemes, syllables, words, phrases, and sentences.

There are two basic challenges to this. One is that the same word is not necessarily pronounced the same way in different contexts. Two different sequences of phonemes may go together to make the same word in different situations. The other challenge is that different speakers tend to pronounce the same word in different ways. In order to address these challenges, research in speech understanding systems led to the development of a specialized AI architecture called blackboard systems. The idea of a blackboard system is that all data relevant to a given problem is put up on a "blackboard" whose contents are visible to the entire system. Thus, the blackboard would initially consist of the set of phonemes obtained from the speech. Then, a set of knowledge sources look at the blackboard, and each one decides if it has something to contribute to solving the problem, and if it does, it makes the appropriate change to the blackboard. A given knowledge source might combine two or more phonemes into a single morpheme based upon context. The set of knowledge sources together have sufficient knowledge to completely solve the problem, so once the set of knowledge sources have completed their work, the correct text will have been generated from the speech. Because different speakers will tend to express the same word differently, it is often necessary to train the system to successfully understand the speech of a given speaker.

Blackboard systems are a lot like rule-based systems in the sense that they consist of a set of knowledge sources with a left hand side which match under certain circumstances, resulting in the execution of the right hand side of the knowledge source. The main difference between blackboard systems and rule-based systems is that there is generally a greater complexity allowed in blackboard knowledge sources in what things one is allowed to match on the left hand side of the knowledge source. This greater complexity means that blackboard systems would really be best implemented in a massively parallel environment, where the individual knowledge sources are all simultaneously looking at the blackboard in order to determine whether to fire (execute).

Speech understanding definitely represents an area of success for artificial intelligence, in that it has been an area of intense research in the past that has led to software today that, while not able to perfectly understand speech, is definitely useful in the "real world". Some would argue that speech understanding is not "real" artificial intelligence, in that there is no "intelligence" involved merely in converting from sounds to written text. However, ultimately the field of artificial intelligence is simply an undertaking of science and engineering, and there are a large number of building blocks that need to be constructed in order to build a synthetic brain with human-level capabilities. No one building block is likely to be seen as "real" artificial intelligence by the naysayers. And yet, once all the building blocks are in place, artificial intelligence will definitely have been realized. Certainly, speech understanding is one important building block.

More information about voice recognition may be found in How to Build a Speech Recognition Application and Voice Recognition. More information about blackboard systems may be found in Blackboard Systems.

Next edition: Natural Language Part 4-The Long Term Vision


Home: Ramalila.NET



All copyrights are maintained by respective contributors and may not be reused without permission. Graphics and scripts may not be directly linked to. Site assets copyright © 2000 RamaLila.com and respective authors.
By using this site, you agree to relinquish all liabilities and claims financial or otherwise against RamaLila and its contributors. Visit this site at your own risk.