Elliot H Omiya, Larry Osterman and Frank Yerrace: Inside Windows 7 - Audio Stack

Play Elliot H Omiya, Larry Osterman and Frank Yerrace: Inside Windows 7 - Audio Stack

The Discussion

  • User profile image

    Larry is back - and Elliot too! Movie.Play();

  • User profile image

    Glad to see that these features made it back in to the OS, I'm not sure why they got cut in Vista

  • User profile image

    Elliot, Larry, and Frank, thanks for all the work you've done on Audio for Vista and 7, truly wicked!  Also, thanks for explaining and demonstrating the audio sound wave at the end of the video, I learned something new today!



  • User profile image

    That was fun.  Awhile back on the E7 blog, there was a post about Improving Audio Glitch Resilience in Windows 7.  I thought some of the telemetry data and testing methodology would surely be discussed. Good thing it wasn't or this would have been a two hour interview.  The link is worth the read.



  • User profile image

    Telemetry data was used to improve almost everything Smiley


    Thanks for opting in!

  • User profile image

    Sounds like a step forward - but as founder of a company that develops PC music software and integrates music systems for schools as our primary business, and has installed hundreds of PCs solely focused on their Audio/MIDI capabilities - mostly leveraging properties of Creative Labs Soundcards, I am continuously anxious about Microsoft's direction/support of MIDI and aspects of the Audio stack. 

    I know this is niche, but it's not THAT niche..there are thousands if not millions of people who will (or would like to) use their PC without extra hardware / software for music making.  And if there PC won't do it, I'm afraid they'll find the other brand (yeah them) that already supports these features.... 


    Here are the two simple requirements that Microsoft must consider to remain a functional competitor to other OS's sfor entry level music-making applications:

    1.  Multiclient USB Midi support in Compliant Driver:   Running more than one program that makes use of MIDI  IN or OUT will cause an alert/error condition to the effect of "Cannot find a MIDI driver on the system".   There are solutions to this via Third party hardware/drivers - but more and more keyboards/instruments guitars featuring USB ports and "compliant" MIDI outputs will experience this error/condition and will only serve to diminish the Windows experience and create confusion. 

    2.  Low Latency upgradeable GM sound source.  GM wavetable synth may have been a great innovation at one time, but it is an unusable output for music making.  The latency is unusable, and even though the sound quality poor, it's the latency that kills it.  A solution with 20ms or less latency is the right choice.  Being able to insert a soundfont would be strategic, and in one single move, regain the support of music education and entry level music production. 


    Are there any improvements in Windows 7 regarding MIDI/GM Wavetable synth? 

  • User profile image
    Larry Osterman

    Ytterbium: None of this was cut from Vista.  Some of this functionality was implemented in hardware and the audio device manufacturers cut the functionality from their hardware at about the same time that Vista shipped, but the two events were unrelated.


  • User profile image

    Hi Larry and Team.  Thanks much.  Your timing on this was amazing.  Messing with Encoder the other day, I discovered we/I need more audio UX to help manage many devices and mixing in this era of media and live streaming.  Playing with "Virtual Audio Cable", I figure that is about the best abstraction for a human I could think of.


    Thinking some UX like below would make things very explicit and simple in a new Media Control panel:

    1) We have two Mics and a Line-In connected to a mixer node.  You can drag and drop the lines.  You can double-click the mixer to change volume levels on the various inputs.

    2) The mixer is streaming to the default speakers *and a Recorder node that is saving my Karoke session to disk as an MP3.

    3) Also, I am recording the line-in seperatly on another Recorder node to a Wav.

    4) Moreover, I am piping the recorder node input to a new Azure audio streaming service endpoint.

    5) Other open apps that use sound could show up as nodes also with lines connected as configured by the app.

    A matrix of other variations are possible.



    Generic Comment Image

    Thinking about your audio feedback issue.  I am wonding if that same technique would make for some kind of simple air collision avoidance system.  As other traffic gets closer, your feedback (looped back to you by other traffic) increases and could trigger alarm.  Maybe stuff already works on this.


    I would also like to explore Charles idea of IObserable more.  A managed api for this stuff is (I think) required.  IObservable would make a good choice to get audio Notifications, etc.  Linq queries is also interesting.  User could hook Linq queries into the Mixer above and filter for certain levels and/or sounds and just capture the filtered data.  Thinking things like voice recognition filters and key word filters, or tell me when Jumping Jake Flash is playing, etc.  Having a managed stack to create and hookup all the objects above is also needed.


    Is audio and video groups the same or different?  Think they should be the same as all above would also apply to video streams.

  • User profile image
    Larry Osterman

    Stacey, you're describing a system which would confound the vast majority of users adding a huge amount of complexity for unclear benefits.


    Windows has always taken the attitude that application authors get to choose if they want to allow the user to specify a particular input or output device or if the application just wants to use the system defaults (of which there can only be one of course).  At a minimum, it radically simplifies the programming model that app developers need to consider.



    w.r.t. managed code, there honestly isn't a way to implement isochronous rendering of either audio OR video in managed code given the current state of the CLR.  The basic problem is that the GC can come in at any time and freeze all the managed code in the application for 200+ MS at a time - this delay is troublesome for audio rendering and deadly for video rendering.


  • User profile image

    Love to see other computer users here rather than just computer watchers or listeners, I'm starting to feel like a couch potato when looking at the audio configuration possibilities. Kidding! According to some figures I saw recently many are just consumers of blogs instead of bloggers but in absolute terms there's still plenty of blogs to go around, just like there's plenty of people needing these types "confounding" functions that one can do on most every physical mixer with a simple cable. It's unfortunate that if MS did a poll they'd get the answer that few would need these things. Either because most just a) haven't encountered the need yet b) have bought a hardware solution to a software problem.


    This reminds me of being at friend place and wanting to play something without denying his computer from outputting audio. He has computer hooked to home theater amp with ton of input. Get this. Only one input can be active at a time! So how I solved this? Instead of plugging stuff into the inputs, I plugged it to the computer for pass through purposes to the amp.


    I believe if you brought Bill Gates a glass and two different liquids and asked what kind of interface would he prefer to deal with this problem he'd probably say that most people are happy just drinking one type of liquid at a time. By definition, mixing things up tends to, you know, mix things up. And that can confound the senses. I poured in water and it doesn't taste like water any more with this other thing in there too!


    re: low latency HQ built in midi instruments. There are diminishing amount of the *Standard PC sound cards on used market. (*Roland LAPC-I). MS should buy a license to LAPC-I algorithms and sounds and put those in then do a GM mapping to it. The price of these cards is on the rise because they are required for so many apps to play correctly. $500 for ISA bus card is silly, if one in a year or two comes to ebay. Unlike the other modules LAPC-I can do tricks you can't do with GM or soundfonts or even $5000 J8. But most importantly, there is simply no alternative at all if you want the music as it was meant to be in many games due to the combined internal waves, filters and programmability (including uploading and processing your own sounds inside the HW which some games did).

  • User profile image
    Larry Osterman

    Androidi: A large part of the reason for those cards being so expensive is that the market for them is so small.  Because the market is small, the price is high.  Creative has some decent MIDI devices for between $50 and $100 though.


    It's possible that if more apps used MIDI there would be a bigger market for MIDI hardware but the reality is that for most applications PCM audio meets their needs.


  • User profile image

    "you're describing a system which would confound the vast majority of users adding a huge amount of complexity for unclear benefits."


    Thanks for reply Larry.  I can't fence with the master as you probably ran around all kinds of UI already.  Not sure can totally agree on confounding users.  In terms of audio, I am just a standard user and I find that UI more understandable.  The other day, I wanted to record some piece of sound from YouTube and make a ringtone for my son.  Was thinking this will be simple....


    So while playing youtube vid, I fired up SoundRecorder.  No joy, it was not recording and no options to select other inputs.  Did not want to start messing around with sound defaults yet.  Scratch head, move on.  So I fire up Audacy.  Still no joy.  So I look in Audacity properties and look at audio inputs.  In combo box is "Microsoft Sound mapper", "Microphone", "Line In", and "Stereo Mixer" (others may have many more).  This is easy?  Having no clue what each does, I start to test.  Make a long story short, setting input device to Stereo Mixer allowed me to record.  This was not obvious or clear to me why this worked.  Point is, if I had more visual indications of what was going on what hooks to what, this would not have taken me 40 minutes of testing.  IMHO, I should not need Audacity to do this.  I should have been able to configure windows to do it.  I think people are smarter.  IMHO, Combo boxes and lists are not the right UI for this job.  People use physical wires and end-points already and do this all the time.  They hook up their stereo systems, ipods, zunes, etc, etc.  MS has many apps with flow (biz server, WWF, etc).   Wires are cool. Smiley

    Maybe just me.


    In regards to managed.  I was not proposing streaming bit in .net.  More a bcl wrapper for config and metadata - start, stop, connect, status, change defaults, change endpoints, etc.  The lower level will still handle the streaming.  There has to be some managed story for this going forward. 

    Thanks much Larry. Appreciate all your hard work. Smiley !!!

  • User profile image

    The market* is only getting bigger and the supply of the both the ISA LAPC-I (custom Roland IC's no longer made) and ISA bus PC's capable of using it is staying constant (there are some ISA to PCI adapters though) . That classic games are being remade tells me that there is market out there. New generations are discovering old DOS classics etc. But if the sound quality is anything like you get out of DosBox, who wants to play that! It's just awful. Of course recent study said ~1% can't tell one tone apart from another so they may not care about quality.


    *defined here as market for sound in classic games as it was meant to be heard. It's impossible to get same sound to the games with other solutions unless you rewrite the code and even then there are few games where you can't do that by running a pre-recorded track of the LAPC-I sound as the game is synthesizing the sound on fly according to game events. Of course the market for this particular card is small indeed because only few people know that one needs it. If this was advertised better then maybe few more might get the idea of asking to incorporate it to the OS or perhaps generously buy the algorithms etc for the DosBox community as they keep on preserving legacy compat where MS is dropping it.

  • User profile image

    Great post +++



    btw isn't it amazing that on other hand you get much more complicated features like:


    but try doing something really similar inside ONE computer. Good luck! It will take some big OEM waving a wad of cash with some new market and usage in mind to get something as difficult to understand as a virtual cable into the OS.

  • User profile image

    MS needs to do a focus group on the "looking for whether a feature exists" aka (discovery phase) and determine which is more confusing, seeing everything on offer at once (like a real mixer with cables) vs navigating the Properties/tabs maze in search of whether or not a particular feature for particular one off need exists. Random boring thoughts I don't care about really: The UI doesn't scale in any way, tight fit to a small screen (imagine few of those properties windows open on some tiny netbook thing) and it doesn't utilize large resolutions either (scrolling in 7.1 playback levels tab which in win7rc doesn't seem smooth, it's hard to explain).


    I'm feeling a bit bad writing all this critical stuff. These audio woes mentioned here and my thread in coffeehouse are nearly a daily frustration for me, 'easily' solved with external gadgets that I do not want (those tend to bring their own woes with them so it's a lose-lose prop). I love the per app volume control and easy access to the faders app level faders. 


    edit: moved stuff to this comment as this is about discovery and not patching.

  • User profile image
    Larry Osterman

    "In regards to managed.  I was not proposing streaming bit in .net.  More a bcl wrapper for config and metadata - start, stop, connect, status, change defaults, change endpoints, etc.  The lower level will still handle the streaming.  There has to be some managed story for this going forward. "


    I don't know how you do start/stop without streaming - start/stop is intimately associated with streaming.  But there might be an opportunity for managed wrappers for some of the other parts of the audio infrastructure, who knows what might end up being developed in the future.


    androidi: One critical thing to consider - Windows tends to be developed for the noob.  My quintessential example is my wife's riding instructor who is a stereotypical computer user.  She ends up calling us when the sound stops working when she tries playing back DVDs (it's an issue with the sound drivers but I can't explain that to her).


    Whenever I think about sound features, I try to think about how this stuff would work for her.  Whatever we do has to be discoverable and usable for someone with little technical expertise.   It also needs to solve a particular customer scenario.


    I like staceyw's scenario above for that reason - he's describes a clean scenario where he's trying to accomplish something that he can't implement using our current system without a great deal of difficulty AND his scenario is something that I could easily imagine customers encountering.  That scenario IS something that we might be able to address in a future version of Windows, Frank's already added it to the very long list of potential features (some of those potential features have been on the list for decades so there obviously are no guarantees about any particular feature being implemented).


  • User profile image

    "I don't know how you do start/stop without streaming - start/stop is intimately associated with streaming."


    media = new Media("c:\my.mp3")

    media.Notify = {n => if (n.Complete)  media.Stop()};

    media.Output = MediaOutput.DefaultSpeakers;



    I am thinking more in terms of commands.


  • User profile image

    When Vista came out Cakewalk Sonar started supporting WDM/KS "as they call it" and it's amazing how fast it is. I've found that it's a tiny bit faster than ASIO. Unfortunately it doesn't work well with my Echo Mia Midi but that's Echo's fault.


    I just wish that software/hardware manufacturers would start supporting WDM/KS like they do ASIO so that Windows users wouldn't have to buy such expensive gear just to get reasonable latencies in software like Ableton Live, which at the moment only supports ASIO and the pretty much useless for music MME/DirectX.

    ASIO works fine, it's just that I'd rather see people start using something that's already part of the OS instead of relying on a third party solution.

  • User profile image

    Hi, nice presentation.

    Just one suggestion: Not to be rude, but just trying to be helpful:

    I would suggest that you included a professional audio engineer, that has theoretical background in both digital signal processing (DSP), and live experience in using professional audio mixers in your team.  Or try to hire somebody to give you a crash course in the mentioned fields. I think that will give you a lot of ideas about how to improve the system.


    Ever wondered why the professional audio software makers (cubase, pro tools++) wants that very low level access to the audio hardware? Because their needs are not adressed by your architecture.


    The audio system of a PC is really a software version of an digital audio mixing console, and I think you could get many ideas by studying the real thing. You could also learn the real difference between a microphone input and the line input, and what the Nyquist theorem really says (your illustrations on the whiteboard showed the Nyquist frequency as 4x the highest source frequency, not 2x). HD audio (that you correctly described as marketing speak) has usually also more bits per sample, not only higher sampling frequency.


    Some concrete suggestions:

    I would like to have an equalizer (EQ) on every stream. Today applications like winamp or iTunes has to implement this in the application, but I would like to have this in the audio system of the OS. For instance: the internal speakers of a laptop maybe needs a cut in the bass, because they are not able to reproduce it anyway, and it only makes them clip. The headphones may need a bit boosted treble, and the USB soundcard connected to the home stereo should have a flat EQ. Maybe this could be generalized to plug-in points in the streams, where you could insert other effects from third-parties, like reverb, compressors and so on.


    Maybe you could have an option in the mixer to show basic/advanced settings (where the EQ, and plug-in stuff is hidden in the basic mode)?


    One other thing: You drew a sketch of where the streams were routed from applications to physical devices. Very nice. Why not make this a graphical thing in the audio system as well, where you could see where the streams are routed, and maybe dragging them to new destinations?


    One other thing: I would like an option to flip right/left channels of a device in software! Many desktop speakers have one volume knob in one of the speakers, and it would be nice to be able to choose what side the volume knob should be on. If I moved the speaker with the knob to the "wrong" side, I could just flip the channels in software to correct it!


  • User profile image



    I just want to keep a voice alive that suggests that a  offering simple/quality solution regarding Audio/MIDI output is the ultimate Noob feature for the Music Maker!   If your wife's riding instructor chose to try and learn piano with a PC in the mix (Gallup NAMM says 85% of americans would like to learn a musical instrument)  - she would have a number of big problems off the bat given current options/configuration. 


    1.  Latency/GM Sound Quality:  Most software programs for music making still refer to MIDI as the only source of input data.  For output most edu programs look to a General MIDI OUT sound source.  USB Controller keyboards (M-audio et al) outputting only MIDI data are THE growth category in the keyboard/music products industry - with shipments in the hundreds of thousands per annum.  And you know what largely they use to make sound - the computer - since sound modules are dead. The only default PC option - GM Wavetable Synth (Licensed from ROland in the 90's) has unusable latency and very poor quality sound.    This one-size-fits-none approach is killing options for the PC as an easy to use music making tool. 


    2.  USB MIDI Driver (probably another dept - but needs to be mentioned).  Running two programs calling upon a MIDI IN device is not  supported by the compliant USB MIDI driver in Windows?  THus, it either crashes the program  - or kicks an error message that will confuse and dissapoint.   


    Solving these two problems would stem market losses in this category to the fruit based competitor that offers these key features.  We lose PC licenses based on poor music making support on an increasing basis every month. 


    Listen, if I'm some lunatic barking at the moon here - let me know - and I'll stop -  but on behalf of a lot of music makers, educators, and folks across the world where music making is active - let me say...help!  Is there any way tweak this in Windows 7 or going forward? 


    If there is another place at MS where I can share our experience in this category, please channel me there...owwwwwwwwwwww (that's me barking...)


  • User profile image

    Hi, it is fun to see that you had no answer.


    Your question raises a lot of problems. Indeed, I don't count the number of people who have migrated to Mac or whom keeped their Windows XP because of audio software acceleration under Windows Vista.


    I don't know who had this fabulous stupid idea but it was very fun.


    Staceyw as M. Osterman said your idea about a managed code to render audio or video is absolutely a big mistake. Go and try some java and .net softwares to see horrible performance losses.

  • User profile image
    Mr Crash

    Will Larry Osterman do any more videos, anytime soon ?

  • User profile image

    This was a great video but I'd like to hear more about the plans (if any) regarding MIDI support/features/etc. for the future of Windows too.

Add Your 2 Cents