A Deeper Look at Speech NUI

Play A Deeper Look at Speech NUI

The Discussion

  • User profile image

    This is great :o)

    But still, you have to talk slow and clean. I want to talk to my computer like I talk to a friend :o)

    Also, how robust this recognition is against foreign accents?


  • User profile image

    This is just getting better. Couldn't find the macro instruction page, but I hope you can program it via recording actions. Kinda like VBA.


    I've created a few work "play" apps while seeing what WPF could do for some projects with triggers firing on speech commands and talkback on events. I've not tried again on Win7 but the talkback part wasn't the best on XP. Except of course when I had it say "ok, if you right click the ok button one more time, I'm not playing any more!" or "Press ok just once, just once!' then make the ok button very small, move around the screen some or change colour. Nothing like watching the user trying to change the opacity of a button from being ghosty that slowely comes back to normal if they move their mouse gently over it otherwise telling them to ask nicely to get it back if they move the mouse to fast.

  • User profile image

    This is great platform, there is huge potential for success.

  • User profile image

    Your speech products need improvements in the following:
    1. Natural Voice Quality: Microsoft Anna is better than Microsoft Sam but it is still much much lower quality in terms of natural voices than the competition. Companies like Loquento and Acapela make much better voices. Alex on the Mac is much much better. Anna sound old by now. You need to modernize the voice and bring it at least up to the level of your competitors.
    2. Slowness: In Vista using Narrator to interact with Microsoft Anna is extremely slow. In all versions of Windows, using a screen reader like Jaws to interact with any SAPI5 or SAPI4 synthesizer takes much longer than any synthesizer which interacts directly with the screen reader and not through the Windows SAPI interface. So, (A) Make Anna respond and work much much faster and (B) Very very importantly!, make the SAPI interface "talk" to the synthesizers fast. Currently, SAPI is extremely slow for any visually impaired person to even use it with any screen reader. That is why most screen readers bypass it. Jaws uses the Real Speak voices through a proprietory interface, bypassing SAPI5. Supernova uses Dolphin SAM (their own product) for interacting with synthesizers, bypassing SAPI5. Conclusion: SAPI5 needs to mature and get much much faster.
    3. Slow development: Your company is too slow to react and make new improvements. Microsoft Anna has been released in 2006 and it is 2010 and no improvements have been made? Why isn't Windows 7 equipped with a better voice? Why has SAPI5 remained vertually the same for 10 years?
    4. Multi-language support: Do the following: Buy an iPhone and turn on its built-in screen reader. Switch the language of the whole phone to a not so popular language, say Greek. The phone's interface comes up in that language. Surprising though, the synthesyzer's voice of the built-in screen reader also swiches to that language. Conclusion: Even from its first version, the iPhone has a lot of built-in synthesizers for lots of languages. Also try: Buy a Nokia phone, any modern one. Observe the many synthesizers that come pre-installed with the phone for so many languages. Now, do the following: Buy Windows 7. Observe that there is only Microsoft Anna, a U.S. only synthesizer. Conclusion: It is unbelievable that in 2010, all your competitors have support for so many languages which comes free with their product, whilst you have only a half-baked U.S. only synthesizer.
    5. Blind people do not like natural voices: The general slowness and inability to use at high speech rates of natural voice synthesizers, has made most blind people (the main users of speech technology on an everyday basis) to shy away from them. Instead, they prefer older synthetic voices, even though some of them might be out-of-support. Take, Eloquence, for example, a synthesizer which is currently at end-of-life by its company. And yet it is still used extensively in  the world, making it the most popular synthesyzer. Take Espeak, for example, a free and open source synthesyzer, which does not sound as good as any of the natural voices but due to its extensive multi-language support and its extremely fast responsiveness, especially at high speech rates, it became the synthesyzer of choice for many blind users. Please then, (A) consider improving Microsoft Sam making it multi-lingual and (B) making it open source. This would free you from some of the development costs but would greatly assist the visually impaired community which is many times unhappy with the move to so called "natural" voices. Try reading long documents all day with a "natural" voice, which is slow to respond and which puts many pauses between words and phrases to sound "natural", and then realize how frustrating it can become.
    5. Windows Mobile has a Synthesyzer? Even though most of the other phone OSes have had a text-to-speech cabability by now, Windows Mobile is still behind I think? I have certainly haven't read anything about Windows Mobile 6.5's text-to-speech and no documentation that I could find said anything. In any case, how many languages are supported? And is there a screen reader built-in? Do you know how much screen readers and so called "natural" voices cost? 300$ 500$, more than the phone certainly.
    6. TellMe Needs Improvements: Compared to Google's 411 service and having tried TellMe's speech recognision extensively, I have to say that it does not mostly get it when you ask for something and it does not even ask to confirmation, giving you in many cases the completely wrong results. You say something, it "thinks" it has recognised something and then it gives you something completely different from what you wanted. Whilst with the Google recognizer, it at least asks for confirmation and it usually recognises what you say correctly. On the whole, you need to improve TellMe. Also, the voice used for TellMe, is out-of-date and it does not sound very well. It sound angry or something.

  • User profile image

    I absolutely agree with "nektar"

Add Your 2 Cents