So, I get involved in a ton of conversations with various internal and external customers about building Silverlight media players. The great thing about Silverlight is that it's deep, and provides lots of ways to build complex applications that include media playback. The flip side of that is that, like all software engineering, there's plenty of sub-optimal ways to do thing that can have a negative impact on media playback performance.
While it's tempting to assume everyone's got a hopping dual-core machine these days, that's not the case. When we reviewed the demographics for the NBC Olympics player we were surprised at what a big chunk of the home users had older, slower, < 2 GHz single-core PCs. We were also somewhat surprised by how fast average broadband has gotten. It's not unusual for consumer to now be bound by CPU power more than bandwidth power as far as the highest bitrate they can play.
That said, don't go overboard – plenty of these techniques are just best practices, but some can limit the complexity of the player you can build. Unless you’re getting dropped frames on the target platforms, easier playback scenarios can support all kinds of effects. Just make sure that the payoff in improved user experience from using the more advanced techniques is worth the perf hit.
First up, let's talk about getting players on the Fast Path - this is when Silverlight doesn't have to do any scaling or compositing of the video rectangle, saving a good chunk of CPU power as well as memory bandwidth. Again, these are really about HD content – SD and below should have plenty of perf even on older machines.
The video's MediaElement should be exactly the size it was encoded at. The simplest way to do that is to just remove the Height and Width elements from the MediaElement. The perf differential is really just scaling versus no scaling; there's no significant advantage to using scaling tricks like exact 2x or 3x scaling, or just scaling on one axis. Just leaving Height= and Width= blank in the MediaElement is probably the easiest way to turn off scaling.
Note that this applies to non-square pixel encoded video. For example, 720x480 encoded as 4:3 won't ever use the fast path; you'd have to encode as square pixel 640x480 instead.
And this applies to scaling down as much as scaling up. Playing 640x480 video at 320x240 will actually take more CPU than just leaving it at 640x480.
If you’re building a video browser that plays multiple video streams at once, it can be worth it to provide low resolution thumbnails at the display size; that’ll allow a lot more clips to be played at once.
The pixels also need to be exactly aligned with the grid, so no decimal coordinates. Pixel Snapping will do this for you automatically in Silverlight 2.
MediaElement, not VideoBrush
VideoBrush can enable some great effects like mirroring, but isn't compatible with Fast Path. Unless you’re doing something that requires VideoBrush, stick with MediaElement.
General XAML tips
The default frame rate of a Silverlight application is 60 fps, while most media encoded for Silverlight is 30 fps or less. Setting the fps of the application to that of the media. That’ll provide better performance, and and make the video and GUI elements seem more in sync.
Don't use Windowless
Windowless mode has a slight perf hit on Windows (it doesn't have a significant impact on Mac). Windowless mode is mainly used to mix Silverlight with HTML or other web elements in the same part of the screen.
While the fast path can operate with overlays, they do take additional CPU to process, so minimize the use and size of overlays to what’s useful. Even an invisible object that overlaps the video, like a play control set to transparent, still gets composited. Instead, when the control is going to go invisible, have it move entirely outside of the video rectangle. As long as it's not overlapping the media, no problem.
Keep MediaElement opaque
Speaking of keeping rendering of transparent elements to a minimum, you definitely want the video itself to be opaque, particularly at bigger frame sizes. Even a hint of transparency will require additional processing of every pixel of every frame.
Constrain video peak bitrate
The main factors in CPU load for video decoding are how many pixels/second are being displayed (height * width * frames per second), and what the peak data rate of the video is.
- With a CBR (Constant Bitrate) encode, the peak and average bitrate are identical. The only variability is the buffer duration. Using a very long duration buffer (like 20 seconds) can make for a data rate spike within the buffer that can make for challenging encoding.
- With a VBR (Variable Bitrate) encode, the peak is higher than the average. There’s no rule about what the difference is. Typically the peak is at least 1.5x the average, but it can go a lot higher.
- A Buffer Window 1-2x your keyframe interval is a good starting point. You don’t want a buffer smaller than the keyframe rate, as the keyframes can wind up be starved for bits, resulting in the “blur-in” effect when the quality drops after a scene change.
If you want to calibrate what VBR peak bitrate you can use on a particular system, it works to test with CBR. Just find out the bitrate + buffer duration you can use with CBR, and use that as the peak bitrate and buffer duration with your VBR encodes.
For high bitrate content where the perf ceiling of the peak buffer is a more important limitation than average bitrate, go ahead and use CBR encoding. That generally provides better results than VBR when the peak would be much less than 1.5x the average.
Encode audio at 44.1 KHz mono or stereo
Silverlight’s internal audio pipeline runs at 44.1 KHz, so even if the audio comes in at a higher rate, you should resample to 44.1 KHz on encode. If you have a lower sample rate source, it’s fine to leave it at that.
And while WMA 10 Professional supports 5.1 and 7.1 audio, Silverlight 2 always mixes down to stereo. So if you’re targeting Silverlight only, convert multichannel sources to stereo before encoding. This also enables the much more efficient WMA 10 Pro codec.
This shouldn’t be news, but don’t encode non-image parts of the video frame, like letterboxing. A 640x480 frame with standard 1.85:1 letterboxing can be cropped and encoded at 640x352 without losing any visual information, but making encoding and decoding faster (36% fewer pixels need to be processed). Silverlight is more than capable of drawing the black rectangles for you client-side if you must have them.
Use Inverse Telecine with 3:2 pulldown
If the source content was sourced from 24p film, but transferred to 29.97i video with 3:2 pulldown, the video file will see a repeating pattern of three progressive and two interlaced frames. Instead of deinterlacing that video (with the inevitable artifacts) and encoding at 29.97, inverse telecine can restore the original 24p, eliminating deinterlacing artifacts, the framerate judder from the 24 to 30 remapping, and providing more bits per frame. And, of course, it’s 25% easier to decode and display 24 frames a second than 30. Make sure to turn the Silverlight application’s frame rate down to 24 fps as well.
Encode as square pixel for fast path
If you’re shooting for the fast path, note that the scaling required for playing back anamorphic video will turn it off. If you have 16:9 720x480 source, you want to play back in a 848x480 window (16:9 480p) , you’ll need to encode at 848x480 if you want to get the fast path.
Depending on the design of your player and the performance of the system, it’ll vary as to whether you’re better off decoding fewer pixels (720x480 instead of 848x480), but losing the fast path. Testing both ways on your target platforms is, as always, the best thing to do.
Encode anamorphic sources as anamorphic for non-fast path
If you’re not going for the fast path, you might as well encode content authored as anamorphic, as anamorphic. For example, the DVCPROHD codec is internally 960x720 in 720p mode. That would be compressed as 1280x720 for 720p playback in square pixels. But since the source is only 960 wide, encoding at 960x720 (set to a 16:9 aspect ratio) will be 33% more efficient in terms of bitrate and decode performance. Depending on the platform, that can be well-worth the sacrifice of Fast Path.
Silverlight 2 specific
Just use Silverlight 2
Silverlight 2 has a bunch of perf improvements over Silverlight 1.0, the most notable being a faster VC-1 decoder and better scaling performance (with better quality to boot). The latter means that not using the Fast Path has less of an impact in Silverlight 2 than before.
The good news is that Silverlight automatically updates, so you don’t need to do anything specific to force this. It can be worth it to retest Silverlight 1.0 and Silverlight 2 Beta applications in Silverlight 2 to see if suggested system requirements can be lowered.
Use Andre & Akshay’s Custom Slider
Our Silverlight teammates Andre Michaud and Akshay Johar have build a custom slider that offers better performance for video playback, particularly with streaming content. This one doesn’t continuously generate valueChanged events, which would then turn into new seeks in the media file. Instead, it waits to issue them until either mouse up on the slider thumb or slider tracker, or just the last seek of a bunch of them if they come at once. So, the user can wiggle the mouse willly-nilly, but it won’t turn into a seek until the movement slows down, or they let go.
The code was in my previous blog post.
Use Expression Encoder 2 Service Pack 1 Templates
The new Expression Encoder 2 SP1 (which I really need to blog about, but then I should blog about Adaptive Streaming first, which is going to be another long post…) adds new Silverlight 2 templates. These implement the described best practices, including using the fast path if the Job Output’s Stretch Mode=None, and a less manic slider.
Test early, test often, test on your target platforms
This was one of my most-used articles of wisdom imparted to me by the legendary Charles Wiltgen of Kinoma. In any project, you want to define your minimum and recommended system specs. You need to have machines of those specs available to test to make sure they actually work. It’s obvious, but often missed. And for complex players with a lot of custom XAML, a Core 2 Extreme can get away with suboptimal design that’ll turn an older P4 into a filmstrip emulator (and you don’t even get that “beep”).