Tutorial: Encoding screen recordings for Silverlight in VC-1 with Expression Encoder 2

Sign in to queue

Description

One of the best parts of my job at Microsoft is when I can put aside the video strategy stuff and do some real-world hands-on video compression encoding for a project. My friends on the IIS team asked me to encode their new tutorials for Silverlight playback, and I thought it was a great project to illustrate the screen encoding tips I talked about a few weeks ago.

As mentioned a few weeks back, Silverlight 1.0 and 2 only support the Windows Media Video 7, 8, and 9 (aka VC-1) for video codecs. We don't support the older Windows Media Video 7 and 9 Screen codecs. This is a fine thing from my perspective; it makes the install size of Silverlight smaller, and we can get better results with our current VC-1 implementation than we can out of the screen codecs. This is because a modern OS like Vista's Aero Glass or Mac OS X 10.5 using a lot of gradients and transparencies that older screen codecs don't handle efficiently, but matches much more closely the kind of video image that VC-1 is designed for.

So, using the beta of Expression Encoder 2, which incorporates the new VC-1 Encoder SDK, let me show a real world project delivering in VC-1 for screen captures.

Goal

The job was to provide a series of source clips demonstrating common tasks in the new IIS 7. Previous screen recordings the team had done used the Windows Media Video 9 Screen and Windows Media Audio 9 Voice codecs with a total bitrate of 500 Kbps for 1024x768, 5 frames per second. There were apparent artifacts in both video and audio, although the content itself was comprehensible. I wanted to reduce the total bitrate to 400 Kbps, while tripling the frame rate to 15 fps and largely eliminating apparent video or audio issues.

Additionally, I also wanted to make files with specs to stream off Silverlight Streaming, which recommends a max peak bitrate of 1400 Kbps. So the total of my peak of video and audio needed to be no more than 1400.

Source

The source had been recorded in Techsmith's Camtasia Studio product, which captures screen activity live to an .AVI file using their lossless video codec. Camtasia does a great job of this kind of screen recording; something like the HDMI to HD SDI I used for my previous Expression Encoder 1.0 training would have been serious overkill for this low-motion lower resolution content, and forced an extra color conversion step.

The tech spec for all the files was:

  • Video: 1024x768 15 fps
  • Audio 44.1 KHz 16-bit stereo

Encoding Settings

IIS_encode_settings

Video Settings

  • Frame Rate: Source. VC-1 is extremely efficient, so we can increase the frame rate from the typical 5 to the full 15 that were originally captured
  • Key frame interval: 20. This is an unusually high setting, but critical to keeping our bitrate down. Since screen recordings often have long sequences without any dramatic changes in the video, it's pretty common for the B and P frames to be tiny, and I-frames to make up the majority of the total bandwidth. So if you wind up with too frequent I-frames, they wind up spending a ton of bits repeating the same static parts of the frame leaving the codec unable to spend those bits on other parts of the image. The normal drawback of long gaps between I-frames is slow random access. However, random access is really a matter of how many P-frames there are between I-frames (as B-frames can be skipped during decoding since no frame references them). Thus, increasing the number of B-frames between P-frames improves random access. Since we'll be using 4 B-frames as you'll see below, only 1 out of 5 frames between I-frames is a P-frame, giving us a max of 60 P-frames between I-frames (15 fps, of which 3 can be P-frames, over 20 seconds between I-frames). So, we'll have about the same random access performance as if we'd encoded at 30 fps with the standard 1 B-frame and a max 4-second keyframe interval (30 fps, of which 15 can be P-frames, over 4 seconds between I-frames)
  • Profile: VC-1 Advanced Profile, so we can use the I-frame DQuant feature below. For Silverlight 1 (which is progressive-scan only) the lack of I-frame DQuant is the only disadvantage to Main Profile compared to Advanced Profile.
  • Mode: VBR peak constrained, so we can specify both an average bitrate (to control file size) and a peak (to make sure it fits within the Silverlight Streaming 1400 Kbps maximum). VBR peak constrained is always a 2-pass encoding process, which we also want in order for the codec to be able to do optimal bitrate distribution over this file with highly variable complexity
  • Bitrate (Average): 350 Kbps, leaving us with 50 Kbps to use on audio.
  • Peak Bitrate: 1300 Kbps, leaving another 100 for audio peak.
  • Buffer Size: 5 I stuck with the default, which is fine for VBR at this bitrate. Bigger would give the codec a little more flexibility to move bits around, but could make playback of the web a little more touchy on slower speed connections.
  • Width and Height: 1024x768, matching the source.

Audio Settings

  • Codec: WMA. While Silverlight 2 adds support for Windows Media Audio 10 Professional, it isn't supported in Silverlight 1.0, which we wanted to use for this demo. We'll stick with good old WMA for maximum backwards compatibility.
  • Mode: VBR. Again, so the codec will distribute bits optimally throughout the piece, savings bits from pauses and spending them on harder bits of content
  • Bitrate: 48 Kbps. This is the lowest supported bitrate for WMA in VBR mode. I could go lower with CBR, but there's often some high-frequency artifacts in WMA CBR @ 32 Kbps and below for voice I find annoying, so I'd rather have overkill with VBR @ 48 Kbps.
  • Sample Rate: 44.1 KHz. Silverlight's internal sound engine runs at 44.1, so I recommend encoding audio to that to avoid an unneeded sample rate conversion. In this case, it also matches the source.
  • Bits per sample: 16, the only option with WMA. I'd use it anyway, as it matches the source.
  • Channels: Stereo, the only option with VBR WMA. WMA will intelligently encode the audio only once when it's identical in both channels, so it's safe to encode a mainly mono mix like this as stereo without a risk of inefficiency. The source in this case is nominally stereo, but is a mono mix.
  • Audio Peak Bitrate: 96, to add to the 1300 for video and to keep us under the 1400 Kbps max for Silverlight Streaming. That's plenty for voice content.
  • Audio Peak Buffer Size: 1.5. This default is nearly always fine.

Advanced Codec Settings

 IIS_advanced_settings

  •  Video Complexity: Normal (3). The default is just fine for simple motion like in screen recordings. Higher values are mainly useful with lots of differing motion in fine details, like with film grain or particle effects. I probably could have gotten away with lower without much drop in quality for this content.

Perceptual Options

  • Adaptive Deadzone: Off. This is good for preserving some coarse texture like film grain, but we don't have any textures we want to preserve here - it's pretty much flat areas, gradients, and fine details like font edges.

  • DQuant: I-Frames Only. DQuant is short for Differential Quantization, where the codec is able to vary the degree of compression (quantization) per macroblock (16x16 block of pixels) in the frame. The DQuant implementation in the VC-1 Encoder SDK used in Expression Encoder 2 looks for areas of smoother texture and then compresses them less. This implementation is much more aggressive than the one that shipped with Format SDK 11, and isn't appropriate for most low-bitrate encoding. But for screen captures, using it's just for I-frames (which are only 1 our of 60, as we determined above) can improve the quality of the I-frames without taking too many bits away from the other frames. And by establishing a very clean reference frame, the following frames based on the I-frame, or based on a frame based on the I-frame, start with a near-perfect copy of the screen image to start from. This reduces the common effect in older codecs where the image can be soft or blocky after a scene change, with the quality improving over the next few frames even though the original image didn't have that change.

Filters

  • In-Loop: On. The In-Loop deblocking filter softens areas where a compression artifact would otherwise be visible, and then predicts future frames on that improved version. This always helps quality at Silverlight bitrates, and I recommend it always be on as long as a low-powered device like a cellphone isn't being targeted; it does slightly increase CPU requirements for playback.

  • Overlap: On. The Overlap filter further softens potential artifacts. Since Silverlight doesn't have the postprocessing modes of Windows Media Player, the overlap filter is good to have on at typical Silverlight bitates. It's more of a brute,force filter than the In-Loop Filter, and can soften the image a bit at high bitrates.

  • Denoise: Off. Source isn't noisy.

  • Noise Edge Removal: Off. No noisy edges

Group of Pictures

  • B-Frame Number: 4. We get two things out of using this instead of the normal 1 with screen recordings. First, it helps improve compression efficiency, given the very simple motion in screen recordings. A B-frame can be based on the previous and/or next I- and P-frame, but not another B-frame. With content like film or video with some random noise in them, too many B-frames hurt quality since a B-frame can be so temporally separate from its reference frames. But a Camtasia screen-record is pixel-perfect, without any random noise. So we actually get an improvement in efficiency. Also, the greater number of B-frames lets us push up the interval between keyframes without hurting latency (as mentioned above), further improving efficiency. Going from a keyframe every 5 and 1 B-frame to a keyframe ever 20 and 4 B-frame, I was able to get better quality at 350 Kbps than I was getting at 600 Kbps before.

  • Scene Change Detection: Always have this on. It will automatically insert an I-frame at cuts, improving compression efficiency and random access.

  • Adaptive GOP: On: Always have this on. It tells the codec not to insert I-frames at regular intervals as defined by "Keyframe every" but just treat that as a maximum distance between GOPs. This helps efficiency quite a bit.

  • Closed GOP: No. Always have this off. Closed GOP makes editing easier (which we're not going to do) but hurts efficiency slightly.

Motion Estimation

  • Chroma Search: Full True Chroma. Not normally needed with screen captures, but helpful in this case as the recordings were done with ClearType on. See the previous blog post about ClearType why that's a potential problem.

  • Motion Method: SAD. The Sum of Absolute Differences is quite a bit faster than the alternate Hadamard or Adaptive modes, and perfectly good for screen recordings without any noise.

  • Search Range: Adaptive. Sometimes those dialog boxes can go pretty fast. And with 4 B-frames, each P-frame has to go back a 1/3rd of a second to the previous P or I-frame for reference. An adaptive motion search range makes sure it'll find the match if it's there.

The Results

And here's the final files, embedded in Silverlight up at IIS.net. Remember to double-click on the video windwo to go full screen and enjoy their full glory. Beyond being a compression demo, they're pretty darn useful demos of common IIS7 activities. There will be a few more files uploaded in the next few weeks, and I'll update this post to include those.

Installing Necessary IIS7 Components on Windows Vista

Install only the components you need for your Web applications by leveraging IIS7’s modular architecture.  This tutorial will cover installing the modules necessary for serving ASP and ASP.NET pages from IIS7 in Windows Vista.

Serving New Content

More flexible deployment options let you decide exactly how you want your Web content served by IIS7.  This tutorial will cover creating your first Web site, Web application and Virtual Directory through the new IIS Manager graphical-user-interface.

Editing Configuration Files

Strongly typed schema written in clear-text XML makes IIS7 configuration files simple to read and edit.  This tutorial covers reading and setting configuration in ApplicationHost.config at the server level and Web.config files at the site and application level.

Troubleshooting Unexpected Issues

Prescriptive detailed errors, automatic failure tracing and more exposed runtime information make IIS7 the simplest and quickest Web server to troubleshoot.  This tutorial will cover debugging site and application failures with the advanced diagnostic features in IIS7.

Setting Up FastCGI for PHP

Improved performance and greater reliability for PHP applications is ensured by the new FastCGI component for IIS7 and previous versions.  This tutorial will cover installing PHP 5.2.1 and the new FastCGI component to IIS7 in Windows Vista.

Delegating Configuration to web.config Files

Distributed, file-based configuration is a powerful new feature of IIS7 that enables delegated management of Web application settings at a very granular level.  This tutorial will cover the structure of IIS and ASP.NET configuration, unlocking IIS configuration for delegation, creating and setting configuration in Web.config files and using location tags.

Using ASP.NET Forms Authentication

HTTP request processing is more integrated in IIS7 allowing ASP.NET features like Forms Authentication to process requests for non-ASP.NET content like ASP, PHP or media files.  This tutorial will cover configuring authentication settings in Web.config, adding users and roles to membership, and configuring authentication for all content types in Integrated Pipeline Mode.

Configuring SSL in IIS Manager

Enabling powerful SSL security to protect your Web applications is simpler to setup with IIS Manager and easier to deploy with self-signed certificates in IIS7.  This tutorial will cover adding self signed certificates, creating certificates with a Certificate Authority and setting up HTTPS bindings.

Extending Web server Functionality in .NET

Building Web server add-ons and extensions is simpler and less time-consuming because IIS7 supports .NET extensibility through the IHTTPModule and IHTTPHandler interfaces that ASP.NET developers already know and use today.  This tutorial will cover building a .NET module starting with the Managed Module Kit, implementing the IHTTPModule interface, attaching EventHandlers to pipeline events and configuring IIS7 to use the module in the request pipeline.

Improving Performance with Native Output Caching

Dramatically reduce Web application response time by leveraging native HttpCacheModule in IIS7 that stores all application outputs in Kernel mode cache.  This tutorial will cover enabling and configuring user-mode and kernel-mode caching by creating new output caching rules in config and through the IIS Manager GUI.

The Discussion

Add Your 2 Cents