FaceLight – Silverlight 4 Real-Time Face Detection

This article is about a winter project I did to learn about video processing and the DirectShow framework. I decided to try making some painting-with-light movies, like those you often see in commercials. When we started, I could not find an existing project—so I bought a couple of webcams and wrote one.
You can see a quick demonstration of the program in action in the following video:
In the next section, we'll learn how to use the program to create a movie. Then we'll look at how the program works and dive into the software's implementation.
This section will give an overview of how to use the program to make a video. You will need three basic pieces:
You can create the program in simple six steps:
Let's look at the application's controls:
Figure
1: A screenshot of the application
There are nine controls for the video camera, audio input and recording:
Some recommended settings for the camera:
Adjust these while watching the video preview area.
The next step is to move the pen light and adjust the threshold until the response to the light feels right. Anything brighter than the threshold will be considered a pen; everything else is just regular pixels. If the setting is too low, everything in the video will appear to be a pen; too high and it won't pick up on the pen. You may wish to go back and adjust the camera's parameters (see the previous section).
The clear button erases the current pen drawing. You can use this periodically as you adjust the settings.
Figure 2: The pen light detector threshold
Making a movie takes some getting used to. Here are a few other tips I learned:
To understand how the software is implemented, we first need a quick overview of DirectShow. Software built using DirectShow employs component (object) graphs to do its work and create overall behavior. Some components—such as video camera or video encoder—are necessary. Some add features, and a few are needed to connect it all. When the graph is built and run, DirectShow makes sure each node agrees on the exact media formats that will be provided. I enjoy object graphs as a design structuring technique, but I found DirectShow a challenge to master.
In a sense, you lay out the basic schematic of the system—or at least the DirectShow portion—and then create code to complete the fine details and add functionality. You can prototype a graph (schematic) with GraphEdit or Monograph EditStudio. These tools have a large catalog of pieces, making it easy to try out different ideas. Then you can lay out the pieces to see if they go together (some pieces just don't) and test it.
This project uses three different graphs:
Let's start with the simple camera preview:
Figure 3: Second DirectShow graph to get the pen
Here are the components and what they do:
Let's look at how the program code does this. The following enumeration is used by the display pull down to list all the video sources:
C#
static public IEnumerable<VideoSource> VideoDevices() { IEnumMoniker em = DeviceEnum(ref DirectShowNode.CLSID_VideoInputDeviceCategory); if (null == em) yield break; foreach (IMoniker Moniker in COM.Enumerator(em)) { VideoSource S = new VideoSource(Moniker), T; string Key = S.DevicePath; if (null == Key) Key = S.DisplayName; if (DevicePath2Source.TryGetValue(Key, out T)) { S.Dispose(); S = T; } else DevicePath2Source[Key] = S; yield return S; } Marshal.ReleaseComObject(em); }
Whenever the video source selection changes, the _VideoSource instance variable is updated accordingly.
The following code builds the video graph (see Figure 3):
C#
DirectShowGraph CamVideo=null; public void BuildPreviewGraph(Control CamPreview) { // Disable any face tracking _VideoSource.FaceTracking = PluralMode.None; // Add the camera source CamVideo = new DirectShowGraph(); CamVideo.Add(_VideoSource, "source", null); // Add the flip video item, as a delegate of a sample grabber SampleGrabber CamFrameGrabber1 = new SampleGrabber(); Flip = new FlipVideo(); CamFrameGrabber1.Callback(Flip); Flip.FlipHorizontal = FlipHorizontal; AMMediaType Media = CamVideo.BestMediaType(RankMediaType); CamVideo.Add(CamFrameGrabber1, "flipgrabber", Media); CamFrameGrabber1.MediaType = Media; // Add the paint-with light item, as a delegate of a sample grabber SampleGrabber CamFrameGrabber = new SampleGrabber(); PaintedArea = new LightPaint(); CamFrameGrabber.Callback(PaintedArea); Media = CamVideo.BestMediaType(RankMediaType); CamVideo.Add(CamFrameGrabber, "grabber", Media); CamFrameGrabber.MediaType = Media; DirectShowNode Preview = new DirectShowNode(DirectShowNode.CLSID_VideoRenderer); CamVideo.Add(Preview, "render1", null); Preview.RenderOnto(CamPreview); // Add a null renderer to consume any extra pins from the camera source DirectShowNode N = new DirectShowNode(DirectShowNode.CLSID_NULLRenderer); CamVideo.Add(N, "null",null); // The size isn't known until we've built a sample grabber graph CamFrameGrabber1.UpdateFrameSize(); Flip.Size = CamFrameGrabber1.FrameSize; CamFrameGrabber.UpdateFrameSize(); PaintedArea.Size = CamFrameGrabber.FrameSize; // Start the camera graph CamVideo.Start(); }
If it has the option, the first thing it does is tell the camera to disable face tracking. Otherwise, the camera might move its view on us, causing all sorts of confusion.
Then it builds the graph using a helper class called DirectShowGraph to add each of the nodes. The helper automatically connects the pins between the passed node and the most recent node with available output pins.
Next, the frame sizes are updated and the graph is executed. DirectShow takes over, moving video from the camera through the graph and onto the display.
The SampleGrabber class is a proxy to the DirectShow COM ISampleGrabber objects. It has a procedure called Callback() that registers a callback to a delegate implementing the ISampleGrabberCB interface.
This project includes a class called FlipVideo whose instances are used here. If enabled, each instance is responsible for flipping the video. Some cameras don't have a built-in setting to do this, so we provide a way to do it in code.
Here is a portion of the code that does this. Because it is byte manipulation, it looks a like C:
C#
public int BufferCB(double SampleTime, IntPtr Buffer, int BufferLen) { unsafe { byte* Buf = (byte*) Buffer; byte* End = Buf + BufferLen-(BufferLen%3); // The width of our buffer int Width = Size . Width; if (!FlipHorizontal) return 0; // This takes about 8 ms (640x480) int Width3 = Width*3; byte* BufEnd = Buf + Width3 * Size.Height; for (byte* BPtr= Buf; BPtr != BufEnd; BPtr+= Width3) for (byte* B = BPtr, BEnd = B+Width3-3; B < BEnd;) { byte Tmp =*BEnd; *BEnd++ = *B; *B++ = Tmp; Tmp =*BEnd; *BEnd++ = *B; *B++ = Tmp; Tmp = *BEnd; *BEnd = *B; *B++ = Tmp; BEnd -= 5; } } return 0; }
This project includes a class called LightPaint, whose instances are used as here. Each LightPaint instance is responsible for three things:
Below is a portion of the code that does this. I've excluded a nearly identical chunk of code that removes the step where the background image pixels are copied in. (We allow this duplication for performance reasons—checking whether to include a background image with every pixel gets expensive!)
C#
public int BufferCB(double SampleTime, IntPtr Buffer, int BufferLen) { unsafe { // This scary construct speeds up the processing of the buffer a lot // by 10ms or more. This is critical in speeding up acsess fixed (byte* _CurrentPoints = CurrentPoints) fixed (byte* _IsPenPoint = IsPoint) fixed (byte* Bknd = Bkgnd) { byte* Buf = (byte*) Buffer; int Width3 = _Size.Width*3, Width=_Size.Width; BufferLen -= BufferLen % 3; byte* End = Buf + BufferLen; byte* CurrentPoint = _CurrentPoints; byte* IsPenPoint = _IsPenPoint; // Scan the image for the points brighter than threshold for (int PI=0,I=0; Buf != End; PI++, I += 3) { byte B1= Buf[0]; byte G1= Buf[1]; byte R1= Buf[2]; byte B2 = _CurrentPoints[I+0]; byte G2 = _CurrentPoints[I+1]; byte R2 = _CurrentPoints[I+2]; // This is the key spot that detects the pen light. // This must be very fast // Tweak this in different ways to see what works if (R1 * RedScale + G1 * GreenScale + B1*BlueScale >= Threshold) { if (B1>B2 || G1>G2 || R1>R2) { _IsPenPoint[PI] = 1; _CurrentPoints[I+0] = B1; _CurrentPoints[I+1] = G1; _CurrentPoints[I+2] = R1; } Buf+=3; continue; } if (0 == _IsPenPoint[PI]) { // Add the current points B2 = Bknd[I+0]; G2 = Bknd[I+1]; R2 = Bknd[I+2]; } *Buf++ = B2; *Buf++ = G2; *Buf++ = R2; } } } return 0; }
… is the “fixed” keyword. It turns an array into a pointer and keeps the garbage collector from touching the array while you use it. In short, “fixed” is everything your mother warned you about in C. But it gives a big performance improvement.
A frame needs to complete processing, compression and be written to the file in less than 30ms. (Usually it must do so in a lot less time, to provide adequate safety margin.) If not, the processing buffers will fill, the video quality could deteriorate, and the video preview will lag so much it will make the program unusable.
I found that “fixed” saves 10ms (about 30%) of the processing time spent in the Sample Grabber delegates. That's big.
Now that we've learned how the pieces go together for painting on a preview, let's look at how we can record the stream.
You can use the next graph to record yourself painting on a still image or the video stream. It's a lot more complicated than the previous one:
Figure 4: DirectShow when painting on the Webcam video or a still picture
This adds a lot more components. Here are the components and what they do:
The overlay delegate is a much simpler version of the LightPaint class. Like the LightPaint class, it flips the video (if appropriate) and puts the pen stroke onto the video preview so you can see where your hand and pen are while painting. It's synchronized with the LightPaint object in order to grab the pen strokes from it.
C#
public int BufferCB(double SampleTime, IntPtr Buffer, int BufferLen) { unsafe { // This scary construct speeds up the // processing of the buffer a lot by 10ms or more. // This is critical in speeding up access: each frame // has to make it thru the whole DS graph // in far less than 30ms. fixed (byte* CurrentPoint = SrcPoints.CurrentPoints) fixed (byte* IsPenPoint = SrcPoints.IsPoint) { byte* Buf = (byte*) Buffer; byte* End = Buf + BufferLen-(BufferLen%3); // The width of the LightPaint delegate and it's size int Width2 = SrcPoints.Size.Width; // The width of our buffer int Width = Size.Width; if (Size == SrcPoints.Size) { // This is for the common, but special, // case where the LightPaint // delegate and use have the size and // we don't need to resize for (int I=0; Buf != End; Buf+=3, I++) if (0 != IsPenPoint[I]) { int J = I*3; Buf[0] = CurrentPoint[J++]; Buf[1] = CurrentPoint[J++]; Buf[2] = CurrentPoint[J++]; } } else { // The loop used to scan over the points // Note: This is designed to allow different // sizes for the LightPaint delegate and // the buffer we are painting on for (int Y = 0, Y2=0; Buf != End; Y2+= dY2, Y=(Y2>>10)) for (int X=0,J=Y*Width2,I=J*3,I2=Y*Width2*1024; Buf != End && X < Width; X++, Buf+=3, I2+= dX2, J=(I2>>10),I=3*J) { if (0 != IsPenPoint[J]) { Buf[0] = CurrentPoint[I]; Buf[1] = CurrentPoint[I+1]; Buf[2] = CurrentPoint[I+2]; } } } } } return 0; }
We can take this a step further by painting onto a movie. This means we need two video sources: one for the camera (which can see the pen light), and one for the movie.
Figure 5: DirectShow when painting on a movie, using your own microphone or the movie's audio
There are two separate filter graphs here. The top graph captures the pen strokes from the camera. The lower graph captures video from a video file, overlays the pen strokes, and previews and encodes the video. The code selects one of the dashed lines at run-time. If the user chooses to use the movie's original audio track, the WM Asf Writer's audio input is connected to the movie's output. Otherwise, the input is connected to the microphone.
We've also introduce two more components:
In this article, I described how to create a paint-with-light effect on pictures, movies, etc. using a webcam. I reviewed the key concepts of DirectShow and how they can be used to create the video. Watching a movie of a sketch being drawn is what makes this different from a simple doodle on a picture.
If you want to try this out, check the download link for the source code at the top of the article!
Randall Maas writes firmware for medical devices, and consults in embedded firmware. Before that, he did a lot of other things… like everyone else in the software industry. You can contact him at randym@acm.org.