Tuesday, December 21, 2010

How do babies learn to speak and write?

This is from the wikipedia.

Information theory was a very trendy scientific approach in the mid 50s.[12] However, pioneer Claude Shannon mused in 1956 that this trendiness was dangerous: "Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems. [...] It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems."[17] Indeed over the next decade, a combination of factors would shut down application of information theory to natural language processing (NLP) problems, in particular machine translation. These were the publication of Noam Chomsky's Syntactic Structures in 1957, in which he stated that only logical rule-based approaches to language analysis were ultimately useful. This accorded well with Artificial Intelligence research of the time, which promoted rule-based approaches. These factors were based on scientific trends. The third factor was to to be the 1966 ALPAC report, which recommended that the government stop funding research in machine translation. ALPAC chairman John Pierce later characterised that field as filled with "mad inventors or untrustworthy engineers". He argued that the underlying linguistic problems had to be solved first before attempts at NLP could be reasonably made. Combined, these three elements essentially halted research in the field.[5][18]

As I begin my 10 calendar days in full screen netbook mode on chrome only with a full working week included in it, the first thought that crosses my mind is not, fortunately, a review of my 2010 and my determinations going forward, which all your comments and emails have led me to give some thought to too, but a tip of the hat to the way technology has changed our lives.  Jelinek died on September 14, 2010, and now with latitude and voice apps, our web presence has taken on a meaning that no one would have ascribed to the phrase web presence 10 years, or even 5 years back.  GOOG-411 was phased out and Google Voice and the enterprise ranges of Microsoft and Dragon are promising to be desktop permanents on the thin clients of today.  2010 was a turbulently good year in some ways for some people.

A pioneer in speech recognition and linguistic engineering, Jelinek was one of the first to dream of practical ways to make human expression machine comprehensible.  Today voice apps and speech to other interactive formats are commonplace.  Think of the AI built into IVR for example, something we use all the time today.  Doctors use VR to generate transcripts of patient records, drivers talk to speech-recognition systems in cars that reply with driving directions; and customer questions to call centers are increasingly being answered by automated speech systems.

How does the magic of speech to other formats work?  To figure that out, it is necessary to reflect on a far more complex process - that of human language and expression, and how the human brain processes it.  To give you an idea of the complexity, consider for a moment an infant being exposed to spoken language for the first time in its existence.  How does it unravel the sounds to discern starts and ends, subjects and verbs, and how does it create its database of word-phrase clusters that the Wernicke and Broca later integrate into predictive listening and comprehending.

The NYT obituary for Jelinek says, “In early speech research, there were two camps. One was the linguists. They argued that humans were best at recognizing speech and that therefore computer models should be based mainly on human language concepts — rules about syntax, grammar and meaning.

Mr. Jelinek, an electrical engineer, took a different tack, advocating the use of statistical tools. In this approach, spoken words are converted to digital form, and the computer is then trained to recognize words and appropriate word order in sentences, based on repeated patterns and statistical probability.”

Jelinek broke free from the traditional approach of trying to reproduce language intelligence the way the neurons of our brain do it, and developed a statistical and probability model that (much later) integrated with the more orthodox grammatical and syntactical endeavours.

When I began dabbling with transcription of medical reports back in 1994-95, speech recognition was a distant dream, but then the work had already begun on the early engines that were working only out of inputs available to it.  All of us were quick to dismiss it as 10 years out in the future.  We renewed our dismissal every five years.

What has happened in the last 20 years on a completely different front and one in which open internet interfaces and services have played a key role is the generation of an entire social environment on the web, replete with every imagineable form of speech processing taking place real time.  Think Facebook, Twitter, Blogspot, discussion boards, chat rooms, and the ability to anonymously and securely use this minefield of language process data to refine the cluster models that are at the core of most contemporary algorithms in this field.

Someone asked me once why I spend so much time on a blog with a grand total of six and a half subscribers, two of them family.  My answer at that point in time was that I was endeavouring to record my speech system in order to improve artificial speech intelligence.  I will not forget the look on her face.  The fact that I was obviously living at thrice the speed of light didnt help matters much.

Today you have millions and millions of people putting out millions and millions of little bits and big bits of expression out in living digital language that can be mined, interpreted, and responded to in a similar fashion, in any format!!  And as each day passes, each moment, each keystroke as I write for you and you alone, this database of how we think and how we express what we think and tell each other what we want to hear from each other is growing more and more valuable and larger.

Search engines and language interpretation modules have a database that is growing in this inconceivable leaps-and-bounds fashion with the millions of you and me blogging, writing and reading and writing about what we are reading, emailing, tweeting, updating status on SNS’s.

I will not waste your time.  Keep winning.  Have a very happy new year.

If you like music, you will want to read.
The Operative Note
If you like food, you will want to read
The unedited mutton dalcha recipe
If you love movies, you will want to read
If you like pictures, you can see

Sunday, November 21, 2010

How to search medical terms or words

From Venu's Askribe email newsletter.

How to search medical terms or words ?
------------ --------- --------- --------- --------- ---
Almost all persons who are  transcribing medical voice files into text, search for medical words or medical terms. Many people use dictionaries related to medical field. There are so many types of dictionaries. I think there are some 10 to 15 dictionaries for searching medical terms and words, but I am not sure.
Is there an alternative to search medical words or terms without first looking into the dictionaries? Now read the below examples.
Example No. 1
I was hearing the voice file and I typed "Altram". After that I searched for the word "Altram" in the medical dictionaries to find if there is any such word. However I could not find the word. After 2 days I found that the correct word was "Ultram". Now if you carefully observe the two words "Altram" and "Ultram", there is only a difference of first letter of the word. The rest of the letters in the word is same. In most cases you can only search for the correct word if you know the correct initial letters of the word. Otherwise it would be difficult to find the words.
Example No.2
After hearing some voice file I typed "Pulse ______97%___ __". At first I could hear only "97%" and after a repeat hearing of the voice file I could hear the word "Pulse". However I knew I was missing more than one word. Now the question is how to find more than one word or pattern of words. Can we search the dictionaries ?
Example No.3
I could only type "Flex_____sig" after hearing a voice file. However at first I typed "Flexsig" and then I typed "Flex Sig". I knew there was something wrong. The question is how will you find the two words correctly? Can you search the dictionaries ? If yes, then which dictionary you will use to search for the correct words ?
Can you answer the above questions? Can you search for the pattern of medical words or terms in dictionaries? Is there an alternative to search medical words or terms without first looking into the dictionaries?
You can search text pattern or words or phrase using tools like grep,awk,sed, etc., . These tools are found in almost all Linux operating systems.
Below are some examples to search medical words and terms.
grep -i  '^[ua]ltram'  mtwords.txt
grep -i  '[ua].*ram'  mtwords.txt
grep -i  '[ua].*am'  mtwords.txt
grep -i  'tram'  mtwords.txt
grep -i  'pulse.*97.* '  mtwords.txt
grep -i  'pulse.*9[012345678 9].*'  mtwords.txt
grep -i  'flex.*sig'  mtwords.txt
grep is a tool which prints lines matching a pattern. And mtwords.txt is a text file which contains all the medical terms or words sequentially. You can create your own mtwords.txt file. Open a text editor and write some medical terms,words or phrase. Below is an example.
------------ Sample of mtwords.txt file-------- ------
Pulse oximetry 97% on room air
Flexible sigmoidoscopy
moderate amount of cerumen
mild cerumen
itchy ears
pharyngitis – viral
841 Burke Avenue, Bronx
New York 10467
------------ End of mtwords.txt file ------------ -----
After creating the above mtwords.txt file you can use grep to search words,terms, phrase etc.,. Remember the tool grep is a command line tool.
In the above file (mtwords.txt) you can write any medical term or word or names of medicines, medical phrase, names of persons or doctors, names of places etc., You can also update this file easily because it is only a text file. New medical terms or phrase can be added easily to this file.
IMPORTANT NOTE: Almost all Linux operating systems have tools like grep, awk, sed, etc.,These tools are command line tools. There are also different versions of these tools and all these versions have some difference. And again there is also some difference between the format of text file in Linux and Windows operating system. You should have some working knowledge of the operating systems like Linux. You should have some knowledge about regular expressions. And finally you should have some knowledge about the tools like grep, awk, sed etc.,
I will explain with one example on using grep.
grep -i   '^[ua]ltram'   mtwords.txt
In the above example  grep is the command and -i is an option to the grep command and '^[ua]ltram' is a regular expression (also called search pattern) and mtwords.txt is the text file which contains all the medical words, terms or phrase. Here we are searching the mtwords.txt file to find if there is any word which begins with ultram or Ultram or altram or Altram. Since there is no word such as altram, only ultram or Ultram will be displayed on the screen. See the above examples on grep. You can search for the word Ultram in any manner. If you can guess some letters or characters in a word or phrase or medical term correctly, then you can search for the correct word.

This article is written by V.Venkateswara Rao e-mail and yahoo chat ID is venkatesh3004 @ yahoo.com

I had placed a query along similar lines, on the net(askribe) several days back. Well, I found the above article(s)  enlightening.
I would like to  modify my query(of last) somewhat, in that, firstly, I am working on a Windows Platform(XP), not Linux.  Secondly, when it comes to the technical part(Line editor) and so on, I am a novice. Though, I do have some general idea on the topic.
Since there are a lot of  E-packages( Medical Dictionaries on Tehnical phrases, medicine, etc.,) on the net, and that too freely downloadable, I would be grateful if you could suggest any proven package that would suffice my requirements.
Thanking You,
My address: gopalanitanair@yahoo.com

Wednesday, October 20, 2010

CBAY Systems file for $115 million US IPO

Medical transcription services company CBaySystems Holdings Ltd. on Monday filed for an initial public offering worth as much as $115 million.  The Franklin, Tenn., company handles medical transcription, billing, and coding services for about 2,400 hospitals, clinics, and practices in the U.S. CBaySystems did not say how many shares it plans to sell or when it plans to complete its IPO. It also did not disclose a proposed ticker symbol for its shares.

Read the Businessweek story here.

Saturday, October 09, 2010

The New Twitter

From the existential Buddhist blog

Anatta, or “not-self”, is a frequently misunderstood Buddhist concept. Let’s clear up three common misconceptions about it right off the bat.  Buddhism doesn’t deny you exist, deny you have a personality, or imply you shouldn’t have an “ego.” What Buddhism does deny is a false conception of the self:  a self that is separate-unto-itself and unchanging.

In its narrowest sense, anatta is a denial of the Vedic conception of atman, an unchanging soul which transmigrates and which, according to the ancient Vedic formula, shares an identity with Brahman, or the godhead.  More broadly, anatta is descriptive of all conditioned phenomena, not just the self, and corresponds to the Mahayana idea of śunyata or emptiness: nothing possesses an unchanging self-nature.

The self can be compared to a whirlpool in the ocean.  A whirlpool is a distinctive feature of the ocean: it’s visible, tangible, and measurable.  It’s real.  It exists.  On the other hand, at any given moment the water that makes up the whirlpool is different from the water that comprised it a moment before, and from the water that will comprise it a moment later.  The whirlpool is a pattern that retains a discernible identity while it continues to exist.  At any given time, there is no separation between the whirlpool and the ocean.  It makes no sense to say that the whirlpool is “here” and the ocean is “there.”  Whirlpooling is a feature of the ocean.
It’s the same with the self.  The self exists as a pattern: a pattern of behavioral response. But that pattern is always in some degree of flux.  While I am always, in some sense, the same person, I am different now than I was at age three, and different from the way I will be at age eighty.  My intellectual capacity and memory will decline as I age.  My tastes and opinions may change as well.  While we are a relatively enduring pattern, we are also constantly changing: learning, developing, maturing, declining.  We also change depending on the situation we find ourselves in.  We behave differently at work, at home, in the bar, and in the zendo.
We are also inseparable from the world around us.  Our skin connects us to the world, rather than separating us from it.

Ok.  I use firefox and have a twitter addon.


This post is only about trying to understand a track back.  When you quote from another blog, you track back to that blog.  Like this superbly written piece at mashable.  The links came when I copied and pasted.

Twitter () has announced that it’s rolling out a new version of its web interface. Some users will start seeing the new look as soon as tonight, though the company says on its blog that it “will roll out as a preview over the next several weeks.”
News () of the company’s plan to integrate multimedia into the stream leaked out earlier this afternoon, but we’ve now learned that the redesign goes much further than that. The new interface resembles that of a far more sophisticated web app (as well as Twitter’s recently released iPad app).
The multimedia partnerships we hinted at earlier today extend to 16 different companies: DailyBooth, DeviantART (), Etsy, Flickr (), Justin.TV (), Kickstarter, Kiva, Photozou, Plixi, Twitgoo (), TwitPic (), TwitVid (), Ustream (), Vimeo (), yfrog, and YouTube ().
Much has been made in recent months of Twitter’s move into areas previously owned by third-party applications. Today’s announcement will no doubt renew such discussion, with many of the best features of Twitter clients like Tweetie (), Seesmic () Desktop, and TweetDeck () now becoming a part of the default Twitter interface. As we also pointed out earlier this afternoon, it also makes Twitter feel a bit more like Facebook.

The next link is generated by Amazon.  Subho

Twitter CEO Evan Williams prefaced his announcement by mentioning that Twitter.com is already far and away the most popular way for accessing the microblogging service, commanding 78% of unique users (which the company defines as “Of all the people who logged into their Twitter account during the month, what percentage did so via each service.”). Combined with Twitter’s growing need to serve up impressions to advertisers, it’s certainly no surprise that the company is now looking to keep people more engaged on its website.

Thursday, October 07, 2010

Leverage your web presence with Tech O2

One of the two professional areas of my life that I feel most enthusiastic about is the social network and its strength. The other is Social reengineering with a rural focus.  Almost every business is on at least one of the social media platforms: Twitter, Facebook, LinkedIn, etc. Unfortunately, it just stops there. What exactly are you doing on Facebook for your business? Social media provides a way for “engaging your customers and prospects.” Just being on Facebook doesn’t mean much; you might have a fan page, but it will be one of the nearly 500 million fan pages. Presence doesn’t count for participation and engaging in social media.

The issue is not entirely ignorance, but a lack of tools and economic campaign models for small businesses to actively participate in social media. Analysts predict that social media will become (has already become in some views) the “new Internet.” Platforms like Facebook have orchestrated plans and methods to capture audiences, and conduct all business interactions within their ecosystem. Large corporations have created a new role in their employee roster called “Social Media Managers,” whose job is to engage their customers and prospects using an array of social media platforms. Social media managers have several tools such as Social Flow, Tweet Deck, CoTweet, etc. to participate in multiple social networks and interact with their clients. They also have the time and energy to carry on different social media campaigns from marketing to promotional activities.  Read More about how to leverage your social network to increase your business.

My personal social network began when I was a kid, and just like the internet, it was made up of people who were not not all there, but neither were they just not here. Books, music, movies, politics.  Read my blog.

Wednesday, June 30, 2010

Imedx-Worldtech and PRN Medical Management

Imedx-Worldtech acquires PRN Medical Transcription.

SHELTON, Conn., June 18 /PRNewswire/ -- iMedX Inc., a leading healthcare software and services company, announced today that it has acquired the assets of PRN Medical Management, LLC (PRN), a medical transcription company serving medical clinics for over 18 years, primarily in Maryland, Washington DC, and Virginia.  With the acquisition of PRN, iMedX expands its base in the Washington DC metro area and beyond. PRN's employees in India will be absorbed by iMedX's subsidiary Worldtech. Using iMedX's revolutionary TurboScribe® platform for medical transcription, customers will enjoy the benefit of an efficient transcription workflow and ease of management. With TurboScribe's integration with leading Electronic Medical Record (EMR) products, customers will progress towards a paperless environment, a higher quality of care, and pave the way for adherence to the Obama administration's 'meaningful use' guidelines.  Additionally, PRN's customers can take advantage of iMedX's other Internet-based products such as its TurboRecord® - iMedX's low-cost, subscription-based EMR, and their ePrescribing product TurboRx® for error-free prescribing through pharmacies across the country.

Venkat Sharma, iMedX's president and CEO, along with the current iMedX management team, will continue to lead the combined companies. PRN's founder and CEO, Gabriela ("Gaby") Perazzo will work with iMedX through the transition even as she develops her passion in online newsletters and seminars.