Compression, although an obsession with me since I was 19, didn't appear to be a career option until many years after that. My years at Hampshire College were spent essentially majoring in neuropsychology, minoring in computer science, and spending my evening and weekends helping out my film student buddies. It all seemed hopelessly random to my parents and advisors, but turned out to be the perfect background for what I do now (after all, what's compression but extremely applied neuropsychology?).
After college and a couple of science internships under my belt I decided I didn't want to spend my life writing grant proposals or doing lab work so I started a video production company with my friends, including my recent interviewer Halstead York. The plan was to use emerging technology to be able to produce and post independent films from our own scripts. We thought we had a financing deal lined up back in 1994, and purchase a NLE: (a PowerMac 8100/80 with a Radius VideoVision card, and 4GB SledgeHammer RAID) was purchased for doing video editing. The idea was we could rent it out before and after post in order to cover some of the costs. Then there were two big problems:
- The infamous defective BART chip in those early PowerMacs meant it couldn't keep sync for more than a few minutes.
- Our financing fell through.
So, there we were, with a script, no money, a bunch of debt, and a NLE that couldn't edit video. However, we found a nice market using shorter clips with looser sync requirements: CD-ROM video! And so we were launched in the heady early days of multimedia. Journeyman Digital was a full service production company for digital media, and we did all the screenwriting, production, and post that we dreamed of, but not for our own projects. But we kept writing screenplays on the side. We got as far as a few meetings with Sony Pictures on one, but like nearly all screenplays, nothing really happened in the end. And while I liked doing the work, when it came down the the fundamental gut check of moving to LA and rolling the dice, I didn't NEED to do it. Instead I got married and soon enough had three little kids, and rather ran out of time for side projects.
Halstead is only recently married and currently kidless, and had time. So he and many members of the old gang dusted off one of our old screenplays, Temporary Insanity and darn if it they didn't actually shoot the whole thing in HD! Halstead just finished up the trailer. Quite an experience seeing jokes I wrote a decade ago there on the screen. And it's amazing to see how it's finally possible to make movies on a hobbyists budget, even with high-end techniques. Check out this post on color correction in the home office.
I didn't have time to work on the production itself (I was busy having that third child get born and joining Microsoft), but I certainly wasn't going to let anyone else compress the trailers (now available for download)!
And so, after all that ramble, we're back to talking about hands-on compression.
Halstead had a pretty typical 2x2 matrix for encoding: two formats at two data rates each:
- MPEG-4 compatible with QuickTime/AppleTV/iPod
- Windows Media compatible with Windows Media Player/Flip4Mac/Xbox/Zune/Silverlight
- 3 Mbps for a 720p30 HD version compatible with Xbox360/AppleTV
- 300 Kbps for a low data rate download, which would also be portable media player compatible (iPod for .mp4, Zune for .wmv)
The source was provided as a 730p30 .AVI file using the CineForm Aspect HD codec. It was video-only - audio was provided in a separate .wav file.
HD WMV encoding was easy - I was able to use the source as is. And the current WMCmd.vbs supports specifying a separate .wav file as source for the audio track.
HD .MOV was harder. I wanted to use QuickTime's H.264 encoder to output, since it uses a complexity-constrained mode that is well tuned for computer playback via QuickTime, on both Intel and PPC (and there's a lot of G4 PowerBooks out there among Indie film fans). While it won't offer the same compression efficiency as a highly-tuned H.264 encoder from another encoder, it'll also playback well on more machines.
However, QuickTime, even QuickTime for Windows, can't read AVI files using the standard DirectShow API! Now that we've added support for the QuickTime API in Expression Media Encoder, it's only fair for Apple to support DirectShow now . So, I used Rhozet Carbon to encode my .avi and .wav source files into a single Photo-JPEG compressed .MOV file that QuickTime could then read (believe it or not, there's no lossless Y'CbCr 4:2:0 encoder in QuickTime for Windows). I wound up doing that compression on my G5, so I could do it in parallel with the WMV encoding on my Windows box.
For the mobile versions, I used VirtualDub to make me a nice 320x180 version of the .AVI and Carbon again to make a 320x180 JPEG .mov.
As an alternative (and what I would have done if this was going to be a high-volume process and not just a one-off) would be to using Carbon to encode all four outputs from the single source. Also, using the "multipass" mode with Carbon and other tools other than QuickTime Player Pro itself results in very, very slow rendering time, since it reruns preprocessing for the entire clip for each pass, although only a small part of the file might be adjusted per pass. So in a high-volume workflow, probably only the 1-pass mode would have been used.
Windows Media Settings
WMV HD @ 3 Mbps:
cscript "C:\Program Files\Windows Media Components\Encoder\WMCmd.vbs" -input "G:\Temp Insanity\Trailer 1 timed v5 720.avi" -output "Trailer 1 720p 3M 192.wmv" -a_input "G:\Temp Insanity\Trailer 1.wav" -a_codec WMASTD -a_mode 4 -a_setting 128_48_2 -v_codec WVC1 -v_mode 4 -v_keydist 5 -v_bitrate 2870000 -v_peakbitrate 6000000 -v_peakbuffer 4000 -v_performance 80 -v_bframedist 1 -v_dquantoption 2 -v_loopfilter 1 -v_mmatch 0 -v_mslevel 4 -v_msrange 0 -v_percopt 2
Pretty standard stuff, with the same basic settings as my previous encodes. A few items of note:
- Not excessive vertical motion and HD, so I didn't bother constraining the number of threads.
- Since the source was just stereo, I used WMA instead of WMA Pro, in order to preserve Silverlight 1.0 compatibility.
- Note the use of the -a_input flag to specify a different audio source.
WMV mobile @ 300 Kbps:
cscript "C:\Program Files\Windows Media Components\Encoder\WMCmd.vbs" -input "Trailer 1 timed v5 320x180.avi" -output "Trailer 1 280 Zune.wmv" -v_codec WMV9 -v_mode 4 -v_keydist 10 -v_bitrate 235000 -v_peakbitrate 600000 -v_peakbuffer 4000 -v_performance 80 -v_bframedist 1 -v_loopfilter 1 -v_overlap 1 -v_mmatch 0 -v_mslevel 2 -v_msrange 0 -v_percopt 2 -v_numthreads 1 -a_codec WMASTD -a_mode 4 -a_setting 48_44_2 -a_peakbitrate 160000
Pretty much identical to the Zune encoding settings I posted last week, except with lower data rates to hit the 300 Kbps total.
- The audio was pretty simple, so 48 Kbps was enough when using VBR mode (again VBR audio is a very underused and very useful feature for downloadable files).
- the data rate was so low, I went to the max and used -mslevel 2 (full floating point chroma search) and -v_numthreads 1 (single-thread encode). Even with those, this encoded much quicker than the HD version, since the frame size was so much smaller.
- Main Profile is required by Zune, and thus I can't use DQuant.
QuickTime's advanced settings aren't available via command-line, so I'll include screen shots of my MPEG-4 settings.
I matched the WMV settings as closely as appropriate.
MPEG-4 Main Profile @ 3 Mbps
- The "Current" mode passes through the source frame size and frame rate (Note it would have said 1280x720 (Current) above - I had a different source loaded when I took the screen shot).
- "Optimize for Download" is the equivalent of our 2-pass VBR modes. However it lacks the ability to specify a peak buffer rate or duration.
- QuickTime specifies keyframe rate in terms of total frames between keyframes, not total seconds.
- The "Better" mode for audio encoding quality is optimal for 16-bit sources. The "Best" mode only improves >16-bit sources
- The Multi-pass mode improves quality, but can make encoding time very unpredictable. The WMV versions encoded quite a bit faster on a similar era machine (Dual 3.4 GHz "NetBurst" Xeon versus dual 2.0 GHz G5). My main compression box, a quad AMD, was busy doing some other work.
- QuickTime lacks a true 2-pass VBR audio mode. For MPEG-4 exports, I only get 1-pass CBR. With a QuickTime export, I could have gotten a 1-pass VBR encode, but only in a MP3 style "range" encode, where the final file size could vary substantially. For soundtracks in downloadable files, this makes WMA a more efficient codec.
- Main Profile is compatible with AppleTV, and uses B-frames. The "Extended" profile is theoretically for streaming, but it's been grayed out in QuickTime since H.264 support launched in QuickTime 7.0, and I've never seen a H.264 Extended Profile stream in the wild.
The mobile encode was the same, except with the lower video and audio data rate, and its use of the Baseline profie, required for iPod compatibility.
So, how did the two encodes come out?
For the most part, they both looked and sounded good (or at least accurate - the audio mix will be improved in a later version). The biggest difference was in flatter areas, especially with shadows. That's where the VC-1 Differential Quantization and Perceptual Optimization come in, plus the ability to use different block sizes(4x4, 4x8, 8x4, and 8x8), to better compress the edges and interiors of flat areas. The Baseline and Main Profiles of H.264 are limited to 4x4 blocks only, and H.264 doesn't have an equivalent mechanism to DQuant to compress flat areas of the image less.
Again, another H.264 encoder could have done a better job here, although at the cost of higher decode requirements, by using features like CABAC and multiple reference frames. High Profile, and hence 8x8 blocks, are not compatible with QuickTime's H.264 decoder, nor those in the AppleTV or iPod. The iPod-required Simple Profile doesn't support B-frames or CABAC.
Here's some samples from the available clips that show different levels of banding. Sorry the luma levels don't quite match - it's surprisingly difficult to get exact level screen grabs out of the QuickTime and DirectShow pipelines. If anything, these minimize the banding seeing in the clips when looking at them in QuickTime on a Mac (2.2 to 1.8 gamma correction issue?).