Peter Friese

Developer Advocate / Mobile Developer / Public Speaker

Tired of Typing? Speak to Your Computer!

| Comments

For some reason, humans have always dreamt of using natural language to communicate with computers. Quite a number of movies have been made that revolve around this theme, 2001: A Space Odyssey and I, Robot (named after the great collection of SF stories by Isaac Asimov just being two of them.

Well, we’ve come a long way since then and computers are more powerful than ever before. I remember using one of the first versions of IBM ViaVoice which would quite literally bog down my computer when I tried using it. The quality of speech recognition software has vastly improved and using a clever stack of technology, you can even use speech recognition on your iPhone (the actual recognition is performed on a server, but the effect is stunning nevertheless).

With all the hoopla around HTML 5, it would be quite a surprise if modern browsers didn’t have something in store with regard to voice recognition. And sure enough, there is a W3C specification for a Speech Input API. Looking at the list of authors might give us a hint as to which browser might support this API…

Using the speech input API is rather easy. All you have to do is to add the x-webkit-speech attribute to any input tag and you’re done. If you’re on a speech-enabled browsers (as of this writing, only Chrome 11 supports this out of the box), you can check it out in the input field below. Just click on the microphone icon and start speaking:

So, the other day I thought, “wouldn’t it be cool if I could use voice recognition to look up my contacts on the social networks I am on?”. Adding voice recognition support to a website you own is rather easy, as you only have to add the x-webkit-speech attribute to the respective input fields. Enhancing foreign sites, however, turns out to be a little bit more involved. Fortunately, Chrome can augment existing websites by way of so-called Content Scripts, which are a part of the Chrome Extensions API.

Writing a Chrome Extension for speech-enabling existing text input fields on just about any website was a matter of minutes, thanks to the good documentation and some jQuery to walk the DOM. Putting on the finishing touches took me some more time, and I am proud to present you Speak to Search – a Chrome Extension that lets you talk with your browser. It works with virtually every website that uses regular HTML input fields. By making some smart assumptions, the extension will automatically submit the current form if the input field is a search field. If it is not, the focus will remain in the field and the form will not be submitted. That way, you can fill out e.g. an address form.

Here is a short video of me using Speak to Search to search for some people on Xing and LinkedIn. Please note that the extension is making sure the speech recognition engine is configured to recognize German names on Xing.

Language makes us human – this is a quote from a video I found during the research for this blog post. I don’t necessarily think voice recognition and speech synthesis will make computers more human, but both technologies certainly can help to create a more immersive experience. I am looking forward to seeing a broader use of the new audio capabilities of modern browsers. Feel free to grab my code from Github and create something new and exciting!

Comments