Tuesday, December 21, 2010

How do babies learn to speak and write?

This is from the wikipedia.

Information theory was a very trendy scientific approach in the mid 50s.[12] However, pioneer Claude Shannon mused in 1956 that this trendiness was dangerous: "Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems. [...] It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems."[17] Indeed over the next decade, a combination of factors would shut down application of information theory to natural language processing (NLP) problems, in particular machine translation. These were the publication of Noam Chomsky's Syntactic Structures in 1957, in which he stated that only logical rule-based approaches to language analysis were ultimately useful. This accorded well with Artificial Intelligence research of the time, which promoted rule-based approaches. These factors were based on scientific trends. The third factor was to to be the 1966 ALPAC report, which recommended that the government stop funding research in machine translation. ALPAC chairman John Pierce later characterised that field as filled with "mad inventors or untrustworthy engineers". He argued that the underlying linguistic problems had to be solved first before attempts at NLP could be reasonably made. Combined, these three elements essentially halted research in the field.[5][18]

As I begin my 10 calendar days in full screen netbook mode on chrome only with a full working week included in it, the first thought that crosses my mind is not, fortunately, a review of my 2010 and my determinations going forward, which all your comments and emails have led me to give some thought to too, but a tip of the hat to the way technology has changed our lives.  Jelinek died on September 14, 2010, and now with latitude and voice apps, our web presence has taken on a meaning that no one would have ascribed to the phrase web presence 10 years, or even 5 years back.  GOOG-411 was phased out and Google Voice and the enterprise ranges of Microsoft and Dragon are promising to be desktop permanents on the thin clients of today.  2010 was a turbulently good year in some ways for some people.

A pioneer in speech recognition and linguistic engineering, Jelinek was one of the first to dream of practical ways to make human expression machine comprehensible.  Today voice apps and speech to other interactive formats are commonplace.  Think of the AI built into IVR for example, something we use all the time today.  Doctors use VR to generate transcripts of patient records, drivers talk to speech-recognition systems in cars that reply with driving directions; and customer questions to call centers are increasingly being answered by automated speech systems.

How does the magic of speech to other formats work?  To figure that out, it is necessary to reflect on a far more complex process - that of human language and expression, and how the human brain processes it.  To give you an idea of the complexity, consider for a moment an infant being exposed to spoken language for the first time in its existence.  How does it unravel the sounds to discern starts and ends, subjects and verbs, and how does it create its database of word-phrase clusters that the Wernicke and Broca later integrate into predictive listening and comprehending.

The NYT obituary for Jelinek says, “In early speech research, there were two camps. One was the linguists. They argued that humans were best at recognizing speech and that therefore computer models should be based mainly on human language concepts — rules about syntax, grammar and meaning.

Mr. Jelinek, an electrical engineer, took a different tack, advocating the use of statistical tools. In this approach, spoken words are converted to digital form, and the computer is then trained to recognize words and appropriate word order in sentences, based on repeated patterns and statistical probability.”

Jelinek broke free from the traditional approach of trying to reproduce language intelligence the way the neurons of our brain do it, and developed a statistical and probability model that (much later) integrated with the more orthodox grammatical and syntactical endeavours.

When I began dabbling with transcription of medical reports back in 1994-95, speech recognition was a distant dream, but then the work had already begun on the early engines that were working only out of inputs available to it.  All of us were quick to dismiss it as 10 years out in the future.  We renewed our dismissal every five years.

What has happened in the last 20 years on a completely different front and one in which open internet interfaces and services have played a key role is the generation of an entire social environment on the web, replete with every imagineable form of speech processing taking place real time.  Think Facebook, Twitter, Blogspot, discussion boards, chat rooms, and the ability to anonymously and securely use this minefield of language process data to refine the cluster models that are at the core of most contemporary algorithms in this field.

Someone asked me once why I spend so much time on a blog with a grand total of six and a half subscribers, two of them family.  My answer at that point in time was that I was endeavouring to record my speech system in order to improve artificial speech intelligence.  I will not forget the look on her face.  The fact that I was obviously living at thrice the speed of light didnt help matters much.

Today you have millions and millions of people putting out millions and millions of little bits and big bits of expression out in living digital language that can be mined, interpreted, and responded to in a similar fashion, in any format!!  And as each day passes, each moment, each keystroke as I write for you and you alone, this database of how we think and how we express what we think and tell each other what we want to hear from each other is growing more and more valuable and larger.

Search engines and language interpretation modules have a database that is growing in this inconceivable leaps-and-bounds fashion with the millions of you and me blogging, writing and reading and writing about what we are reading, emailing, tweeting, updating status on SNS’s.

I will not waste your time.  Keep winning.  Have a very happy new year.

If you like music, you will want to read.
The Operative Note
If you like food, you will want to read
The unedited mutton dalcha recipe
If you love movies, you will want to read
If you like pictures, you can see