Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

A Deeper Look at Speech NUI

Download

Right click “Save as…”

Rob Chambers from the Speech at Microsoft group stopped by to show us a little more about our Speech platform and where developers can get started and a look at some of the things that are possible. What you see (and hear) today is built on the backbone of our work with the Tablet PC platform. One of the bigger changes for users is that where before we had two branches of speech recognition, one for command control and one for dictation, it's now been rolled into a single system. I've used speech in the past with my Tablet, but honestly I hadn't used it much in Windows 7, but seeing someone who knows what they're doing really got me motivated to dive back in. For instance, I'm looking at using the macro system to build some speech commands that control different functions in games. Smiley

Tags:

Follow the Discussion

  • Alex SkidanovIshy Moo-o-​oo!

    This is great :o)

    But still, you have to talk slow and clean. I want to talk to my computer like I talk to a friend :o)

    Also, how robust this recognition is against foreign accents?

     

  • ChevalN2Cheval Why not null?

    This is just getting better. Couldn't find the macro instruction page, but I hope you can program it via recording actions. Kinda like VBA.

     

    I've created a few work "play" apps while seeing what WPF could do for some projects with triggers firing on speech commands and talkback on events. I've not tried again on Win7 but the talkback part wasn't the best on XP. Except of course when I had it say "ok, if you right click the ok button one more time, I'm not playing any more!" or "Press ok just once, just once!' then make the ok button very small, move around the screen some or change colour. Nothing like watching the user trying to change the opacity of a button from being ghosty that slowely comes back to normal if they move their mouse gently over it otherwise telling them to ask nicely to get it back if they move the mouse to fast.

  • This is great platform, there is huge potential for success.

  • Your speech products need improvements in the following:
    1. Natural Voice Quality: Microsoft Anna is better than Microsoft Sam but it is still much much lower quality in terms of natural voices than the competition. Companies like Loquento and Acapela make much better voices. Alex on the Mac is much much better. Anna sound old by now. You need to modernize the voice and bring it at least up to the level of your competitors.
    2. Slowness: In Vista using Narrator to interact with Microsoft Anna is extremely slow. In all versions of Windows, using a screen reader like Jaws to interact with any SAPI5 or SAPI4 synthesizer takes much longer than any synthesizer which interacts directly with the screen reader and not through the Windows SAPI interface. So, (A) Make Anna respond and work much much faster and (B) Very very importantly!, make the SAPI interface "talk" to the synthesizers fast. Currently, SAPI is extremely slow for any visually impaired person to even use it with any screen reader. That is why most screen readers bypass it. Jaws uses the Real Speak voices through a proprietory interface, bypassing SAPI5. Supernova uses Dolphin SAM (their own product) for interacting with synthesizers, bypassing SAPI5. Conclusion: SAPI5 needs to mature and get much much faster.
    3. Slow development: Your company is too slow to react and make new improvements. Microsoft Anna has been released in 2006 and it is 2010 and no improvements have been made? Why isn't Windows 7 equipped with a better voice? Why has SAPI5 remained vertually the same for 10 years?
    4. Multi-language support: Do the following: Buy an iPhone and turn on its built-in screen reader. Switch the language of the whole phone to a not so popular language, say Greek. The phone's interface comes up in that language. Surprising though, the synthesyzer's voice of the built-in screen reader also swiches to that language. Conclusion: Even from its first version, the iPhone has a lot of built-in synthesizers for lots of languages. Also try: Buy a Nokia phone, any modern one. Observe the many synthesizers that come pre-installed with the phone for so many languages. Now, do the following: Buy Windows 7. Observe that there is only Microsoft Anna, a U.S. only synthesizer. Conclusion: It is unbelievable that in 2010, all your competitors have support for so many languages which comes free with their product, whilst you have only a half-baked U.S. only synthesizer.
    5. Blind people do not like natural voices: The general slowness and inability to use at high speech rates of natural voice synthesizers, has made most blind people (the main users of speech technology on an everyday basis) to shy away from them. Instead, they prefer older synthetic voices, even though some of them might be out-of-support. Take, Eloquence, for example, a synthesizer which is currently at end-of-life by its company. And yet it is still used extensively in  the world, making it the most popular synthesyzer. Take Espeak, for example, a free and open source synthesyzer, which does not sound as good as any of the natural voices but due to its extensive multi-language support and its extremely fast responsiveness, especially at high speech rates, it became the synthesyzer of choice for many blind users. Please then, (A) consider improving Microsoft Sam making it multi-lingual and (B) making it open source. This would free you from some of the development costs but would greatly assist the visually impaired community which is many times unhappy with the move to so called "natural" voices. Try reading long documents all day with a "natural" voice, which is slow to respond and which puts many pauses between words and phrases to sound "natural", and then realize how frustrating it can become.
    5. Windows Mobile has a Synthesyzer? Even though most of the other phone OSes have had a text-to-speech cabability by now, Windows Mobile is still behind I think? I have certainly haven't read anything about Windows Mobile 6.5's text-to-speech and no documentation that I could find said anything. In any case, how many languages are supported? And is there a screen reader built-in? Do you know how much screen readers and so called "natural" voices cost? 300$ 500$, more than the phone certainly.
    6. TellMe Needs Improvements: Compared to Google's 411 service and having tried TellMe's speech recognision extensively, I have to say that it does not mostly get it when you ask for something and it does not even ask to confirmation, giving you in many cases the completely wrong results. You say something, it "thinks" it has recognised something and then it gives you something completely different from what you wanted. Whilst with the Google recognizer, it at least asks for confirmation and it usually recognises what you say correctly. On the whole, you need to improve TellMe. Also, the voice used for TellMe, is out-of-date and it does not sound very well. It sound angry or something.

  • I absolutely agree with "nektar"

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.