Silverlight Media technologies overview in Expression newsletter

As Expression Encoder 2 approaches its immenent release, I've been using it for more and more real-world projects. This recent one was particularly chewy fun, and I thought it would make a good tutorial for a high-touch workflow.
As you may remember from a few weeks ago, I was one of the inaugural class of Streaming Media's Streaming Media All-Stars. There was a fun video montage of all of us on baseball cards being announce by ballpark-style narration. Good stuff, but the FLV compression wasn't quite up to my standards for this rare intersection of compression obsession and personal vanity. So I contacted Streaming Media and asked if I could take my own whack at it.
I'll have an expanded version of this post as an article in an upcoming issue of Streaming Media Magazine. If you don't get it, you can sign up for a free subscription.
One thing I noticed in the original is that the background graphics and a few of the animations were interlaced, as you can see in the last "before" image at the very bottom of the page.
While deinterlacing it may have been possible, the heavyweight motion-adaptive deinterlacers available for technologies like AVISynth can be finicky to configure, and extremely slow. And in the end, nothing beats getting the source fixed in the first place. Compression is the art of getting output that's as close to the original as possible with the bits you have available; often getting access to higher quality sources can provide a much bigger improvement to final quality than all the codec tweaking in the world.
So, I contacted the post house, and they fixed the background interlacing (it was just a matter of properly flagging the source as interlaced in After Effects) and re-rendered it for me as a lossless RGB PNG codec QuickTime .mov file. However, there were two shots that snuck through where one layer was still interlaced. I didn't want to wait for another disc, so I dived into After Effects (in the end, all difficult preprocessing jobs seem to wind up in After Effects). I used the "Reduce Interlace Filter" with a softness of 1 to blend the two fields together. Traditional deinterlace methods messed up the text on the cards too much. However, the softness increase from that filter wound up causing a slight visual discontinuity when it kicked in. So, I broke out the two shots with interlacing into layers, and then used a five-frame cross-dissolve transition from the original progressive frames to the start of the interlaced shot which hid the slight loss of focus (masked in part by the motion). Both interlaced shots ended on a hard cut, so I was able to switch back to the original video without a transition.
I then rendered the new version out from After Effects in 32-bit float (to reduce the risk of introducing banding via an 8-bit to 8-bit conversion) into the Lagarith codec in YV12 mode, which uses the native 8-bit 4:2:0 colorspace of VC-1 and other codecs. This means that Expression Encoder doesn't need to do any color space conversion, making compression slightly faster.
The other notable issue with the original clip was "keyframe popping"; when an obvious "jump" in the video happens at the keyframe rate of the video. Watch the original FLV, and you'll see it during any of the longer static shots. Since the whole section with the cards is one single long shot over 3 minutes long without any hard cuts, there wasn't a place for natural keyframes (automatically inserted at a hard cut) to go. Thus keyframe transitions would happen while the cards were otherwise static, making even a slight change visible.
I also wanted to show off the Expression Encoder templates a bit by doing thumbnail navigation. In EEv2, I'm able to graphically set markers on particular frames, and set them to be keyframes and/or thumbnails. A thumbnail becomes an image file which, with the supported templates, automatically gets included in the menus for navigation (think a chapter on a DVD). Normally you also want to make the chapter points keyframes, since keyframes support immediate random access, as no other frames need to be decoded before displaying a keyframe.
This was an opportunity to kill two birds with one stone; if I set the markers on the first static frame of every card, it'd be nice high quality image that all the later frames that reference that I-frame can be based on, propagating its quality forward. If I set my keyframe spacing long enough, there wouldn't be any other keyframes in that interval to cause keyframe popping, and so the static card would be very consistent.
So, I set a marker for each person, flagged to be both a thumbnail and a keyframe. The audio doesn't always sync up exactly so that the person's name begins after their card is down, so sometimes the first name is cut off. This would have been easy to fix by just delaying the audio a second.
You can also use non-thumbnail keyframe markers; these become keyframes without showing up in navigation. I stuck a few of those in as well in the intro/outro sections, on the first full frames after the logo gets built. Since the sponsor pays the bills (Ripcode in this case), I always want to make sure that logos remain nice and crisp.
Setting keyframes has been around in compression projects for ages now; I did a lot of this in Premiere 4.0 for Cinepak encodes in the pre-Media Cleaner days, since Cinepak was prone to keyframe popping issues. Modern codecs like VC-1 do a much better job of finding good natural keyframes, and also to reduce popping issues. The Silverlight version would have looked a lot better than the Flash even if I hadn't set them, but they did get a further boost in quality. But don't think this is something you should be doing in every case; this clip is unusual in having minutes without cuts with a mix of static and moving elements, at an extremely low bitrate.
Now, what encoding settings do we want to use?
The Output pane has some of my favorite usability features of Expression Encoder, letting us apply rich templates and automatic publishing.
First, the Template. I picked the "Clean" template, which has a nice subtle overlay control, and a popup navigation via the thumbnails we made above when you mouse over the top of window. It also supports going full screen with a double-click. One thing I like about Clean is that the video fills the frame exactly, without having to account for the control bar or other elements. So I can embed at exactly 640x480 for a 640x480 clip.
The publish mode (I've got the optional Silverlight Streaming publishing plugin installed) lets me automatically or manually upload the final project to our Silverlight Streaming service. This is a great way to test or deliver Silverlight projects. You can sign up for a free account with 10 GB of storage and 5 TB/month of bandwidth.
So, how much did all this help? Here's a couple of the more pronounced before/after shots. All the below are inserted as 100% scale PNG, so there's no scaling or further compression to complicate the comparison. Note that the FLV came out darker for some reason. I'm not sure what the cause of that was; the VC-1 brightness matches the source. Perhaps something to do with the Mac/Windows gamma difference on the platform the FLV was encoded on? This actually makes VC-1's job relatively harder, since the motion graphics are easier to see.
And you can see the actual clips in action here:
I grabbed a frame right after the transition that really shows the detail difference between VP6 and VC-1 here; it's especially striking in the texture of the shirt. The VP6 gets sharper after a keyframe pop, but this is how it starts. VC-1 quality in the card is maintained perfectly throughout.
In this frame (man, do I look like a stiff!), you can see the effect of my blend deinterlace to hide the fields. Notice the ringing artifacts in the original frame. Encoding fields as progressive is extremely challenging for codecs, since you have high motion 1-pixel high horizontal lines, combing high frequency and high detail. I normally don't like doing a blend, since those double-images are also hard to encode, but it was only for a very short duration in this clip, and the deinterlacing filters I had handy had a lot of trouble preserving the text perfectly.