Paint With Light

This article is about a winter project I did to learn about video processing and the DirectShow framework. I decided to try making some painting-with-light movies, like those you often see in commercials. When we started, I could not find an existing project—so I bought a couple of webcams and wrote one.

You can see a quick demonstration of the program in action in the following video:

In the next section, we'll learn how to use the program to create a movie. Then we'll look at how the program works and dive into the software's implementation.

How to Use the Program

This section will give an overview of how to use the program to make a video. You will need three basic pieces:

  • A pen light or laser pointer to draw sketches with. You can buy cheap laser pointers and LED penlights at almost any convenience or office supply store.
  • A web camera to capture the light pen.
  • Software to get the video from the web camera and encode the video file. We'll discuss this program below.

You can create the program in simple six steps:

  1. Select a video camera to use (plug one in, if necessary)
  2. Configure the camera
  3. Set the threshold for the pen
  4. Select the background image or video, if you'll be using one
  5. Start recording
  6. Draw!

Let's look at the application's controls:

image Figure 1: A screenshot of the application

There are nine controls for the video camera, audio input and recording:

  • A combo box for selecting the camera
  • An option to flip the camera's video
  • An option to tweak the camera's video control settings
  • A slider to control the discrimination between the pen light and normal scene
  • A combo box for selecting the microphone to use when recording
  • An option to choose the background
  • An option to choose the desired size of the output video
  • Buttons to start, stop or pause recording
  • A button to clear the current drawing

Some recommended settings for the camera:

  • Reduce exposure setting until there is no glare
  • Lower the brightness until there is no glare
  • Disable the white balance “auto” setting and manually adjust it
  • Disable the auto-exposure setting (if applicable)

Adjust these while watching the video preview area.

The Pen Threshold Slider

The next step is to move the pen light and adjust the threshold until the response to the light feels right. Anything brighter than the threshold will be considered a pen; everything else is just regular pixels. If the setting is too low, everything in the video will appear to be a pen; too high and it won't pick up on the pen. You may wish to go back and adjust the camera's parameters (see the previous section).

  • If you are using a laser pointer, I recommend putting Scotch tape over the pointer. This will diffuse the light and keep it from overwhelming the CCD in the webcam and ruining the image.

The clear button erases the current pen drawing. You can use this periodically as you adjust the settings.

image
Figure 2: The pen light detector threshold

Other Tips for Making a Video

Making a movie takes some getting used to. Here are a few other tips I learned:

  • Try to use a dark background behind you.
  • Use a light low, and pointed toward the front of your face. This will make you look better and ensure the light isn't too bright on your skin.
  • Try not to wear anything shiny, such as shirts or buttons. Depending on your lighting, the shiny part can look like a pen.
  • Don't have any glass or other items behind you. Light might reflect off glass or a forehead, and look like a pen.
  • Don't vary the lighting. Some webcams change their parameters automatically based on light level; others change their frame rate when it's darker (to allow longer exposure).

DirectShow

To understand how the software is implemented, we first need a quick overview of DirectShow. Software built using DirectShow employs component (object) graphs to do its work and create overall behavior. Some components—such as video camera or video encoder—are necessary. Some add features, and a few are needed to connect it all. When the graph is built and run, DirectShow makes sure each node agrees on the exact media formats that will be provided. I enjoy object graphs as a design structuring technique, but I found DirectShow a challenge to master.

In a sense, you lay out the basic schematic of the system—or at least the DirectShow portion—and then create code to complete the fine details and add functionality. You can prototype a graph (schematic) with GraphEdit or Monograph EditStudio. These tools have a large catalog of pieces, making it easy to try out different ideas. Then you can lay out the pieces to see if they go together (some pieces just don't) and test it.

This project uses three different graphs:

  1. A graph to show the camera preview, which is used when not recording or when sketching on a movie
  2. A graph to record from web camera, with or without a still image in the background
  3. A graph to paint on a video from a file

The Preview Graph

Let's start with the simple camera preview:

image
Figure 3: Second DirectShow graph to get the pen

Here are the components and what they do:

  • The Camera: your webcam (or other video input device)
  • A Sample Grabber to get an image frame and pass it to a delegate. This graph uses two: The first, Flip Video, flips the video to make the preview more like a mirror. The second, Light Paint, scans for pen light and adds points in for the previous pen. I'll go into a bit more detail below.
  • The Video Renderer is a helper object that displays the camera image on the window. It requires a window region to display on.

Finding the Camera

Let's look at how the program code does this. The following enumeration is used by the display pull down to list all the video sources:

C#

static public IEnumerable<VideoSource> VideoDevices()
{
  IEnumMoniker em = DeviceEnum(ref DirectShowNode.CLSID_VideoInputDeviceCategory);
  if (null == em)
    yield break;

  foreach (IMoniker Moniker in COM.Enumerator(em))
  {
     VideoSource S = new VideoSource(Moniker), T;
     string Key = S.DevicePath;
     if (null == Key)
       Key = S.DisplayName;
     if (DevicePath2Source.TryGetValue(Key, out T))
       {
          S.Dispose();
          S = T;
       }
      else
       DevicePath2Source[Key] = S;

     yield return S;
  }

  Marshal.ReleaseComObject(em);
}

Whenever the video source selection changes, the _VideoSource instance variable is updated accordingly.

Creating the Preview Graph

The following code builds the video graph (see Figure 3):

C#

DirectShowGraph CamVideo=null;
public void BuildPreviewGraph(Control CamPreview)
{
  // Disable any face tracking
  _VideoSource.FaceTracking  = PluralMode.None;

  // Add the camera source
  CamVideo = new DirectShowGraph();
  CamVideo.Add(_VideoSource, "source", null);

  // Add the flip video item, as a delegate of a sample grabber
  SampleGrabber CamFrameGrabber1 = new SampleGrabber();
  Flip = new FlipVideo();
  CamFrameGrabber1.Callback(Flip);
  Flip.FlipHorizontal = FlipHorizontal;
  AMMediaType  Media = CamVideo.BestMediaType(RankMediaType);
  CamVideo.Add(CamFrameGrabber1, "flipgrabber",  Media);
  CamFrameGrabber1.MediaType = Media;

  // Add the paint-with light item, as a delegate of a sample grabber
  SampleGrabber CamFrameGrabber = new SampleGrabber();
  PaintedArea = new LightPaint();
  CamFrameGrabber.Callback(PaintedArea);
  Media = CamVideo.BestMediaType(RankMediaType);
  CamVideo.Add(CamFrameGrabber, "grabber",  Media);
  CamFrameGrabber.MediaType = Media;

  DirectShowNode Preview = new DirectShowNode(DirectShowNode.CLSID_VideoRenderer);
  CamVideo.Add(Preview, "render1", null);

  Preview.RenderOnto(CamPreview);

  // Add a null renderer to consume any extra pins from the camera source
  DirectShowNode N = new DirectShowNode(DirectShowNode.CLSID_NULLRenderer);
  CamVideo.Add(N, "null",null);

  // The size isn't known until we've built a sample grabber graph
  CamFrameGrabber1.UpdateFrameSize();
  Flip.Size = CamFrameGrabber1.FrameSize;
  CamFrameGrabber.UpdateFrameSize();
  PaintedArea.Size = CamFrameGrabber.FrameSize;

  // Start the camera graph
  CamVideo.Start();
}

If it has the option, the first thing it does is tell the camera to disable face tracking. Otherwise, the camera might move its view on us, causing all sorts of confusion.

Then it builds the graph using a helper class called DirectShowGraph to add each of the nodes. The helper automatically connects the pins between the passed node and the most recent node with available output pins.

Next, the frame sizes are updated and the graph is executed. DirectShow takes over, moving video from the camera through the graph and onto the display.

The FlipVideo Sample Grabber Delegate

The SampleGrabber class is a proxy to the DirectShow COM ISampleGrabber objects. It has a procedure called Callback() that registers a callback to a delegate implementing the ISampleGrabberCB interface.

This project includes a class called FlipVideo whose instances are used here. If enabled, each instance is responsible for flipping the video. Some cameras don't have a built-in setting to do this, so we provide a way to do it in code.

Here is a portion of the code that does this. Because it is byte manipulation, it looks a like C:

C#

public int BufferCB(double SampleTime, IntPtr Buffer, int BufferLen)
{
    unsafe
    {
        byte* Buf = (byte*) Buffer;
        byte* End = Buf + BufferLen-(BufferLen%3);

        // The width of our buffer
        int Width  = Size . Width;

        if (!FlipHorizontal)
        return 0;
        // This takes about 8 ms (640x480)
        int Width3 = Width*3;
        byte* BufEnd = Buf + Width3 * Size.Height;
        for (byte* BPtr= Buf; BPtr != BufEnd; BPtr+= Width3)
            for (byte* B = BPtr, BEnd = B+Width3-3; B < BEnd;)
            {
                byte Tmp =*BEnd;
                *BEnd++ = *B;
                *B++    = Tmp;
                Tmp =*BEnd;
                *BEnd++ = *B;
                *B++    = Tmp;
                Tmp     = *BEnd;
                *BEnd   = *B;
                *B++    = Tmp;
                BEnd -= 5;
            }
    }
    return 0;
}

The LightPaint Sample Grabber Delegate

This project includes a class called LightPaint, whose instances are used as here. Each LightPaint instance is responsible for three things:

  1. Finding the pen points
  2. Replacing the video with the background image (optional). Although this isn't necessary for the simple preview case, it is necessary in the more complex configurations. We'll look at those later.
  3. Painting all the pen strokes

Below is a portion of the code that does this. I've excluded a nearly identical chunk of code that removes the step where the background image pixels are copied in. (We allow this duplication for performance reasons—checking whether to include a background image with every pixel gets expensive!)

C#

public int BufferCB(double SampleTime, IntPtr Buffer, int BufferLen)
{
    unsafe
    {
        // This scary construct speeds up the processing of the buffer a lot
        // by 10ms or more.  This is critical in speeding up acsess
        fixed (byte* _CurrentPoints = CurrentPoints)
        fixed (byte* _IsPenPoint   = IsPoint)
        fixed (byte* Bknd         = Bkgnd)
        {
            byte* Buf = (byte*) Buffer;
            int Width3 = _Size.Width*3, Width=_Size.Width;

            BufferLen -= BufferLen % 3;
            byte* End = Buf + BufferLen;
            byte* CurrentPoint = _CurrentPoints;
            byte* IsPenPoint = _IsPenPoint;

            // Scan the image for the points brighter than threshold
            for (int PI=0,I=0; Buf != End;  PI++, I += 3)
            {
                byte B1= Buf[0];
                byte G1= Buf[1];
                byte R1= Buf[2];
                byte B2 = _CurrentPoints[I+0];
                byte G2 = _CurrentPoints[I+1];
                byte R2 = _CurrentPoints[I+2];

                // This is the key spot that detects the pen light.  
                // This must be very fast
                // Tweak this in different ways to see what works
                if (R1 * RedScale + G1 * GreenScale + 
                    B1*BlueScale >= Threshold)
                {
                    if (B1>B2 || G1>G2 || R1>R2)
                    {
                        _IsPenPoint[PI] = 1;
                        _CurrentPoints[I+0] = B1;
                        _CurrentPoints[I+1] = G1;
                        _CurrentPoints[I+2] = R1;
                    }
                    Buf+=3;
                    continue;
                }

                if (0 == _IsPenPoint[PI])
                {
                    // Add the current points
                    B2 = Bknd[I+0];
                    G2 = Bknd[I+1];
                    R2 = Bknd[I+2];
                }

                *Buf++ = B2;
                *Buf++ = G2;
                *Buf++ = R2;
            }
        }
    }

    return 0;
}

The Scariest Keyword in .NET…

… is the “fixed” keyword. It turns an array into a pointer and keeps the garbage collector from touching the array while you use it. In short, “fixed” is everything your mother warned you about in C. But it gives a big performance improvement.

A frame needs to complete processing, compression and be written to the file in less than 30ms. (Usually it must do so in a lot less time, to provide adequate safety margin.) If not, the processing buffers will fill, the video quality could deteriorate, and the video preview will lag so much it will make the program unusable.

I found that “fixed” saves 10ms (about 30%) of the processing time spent in the Sample Grabber delegates. That's big.

The Graph to Painting on the Video Stream or Still Image

Now that we've learned how the pieces go together for painting on a preview, let's look at how we can record the stream.

You can use the next graph to record yourself painting on a still image or the video stream. It's a lot more complicated than the previous one:

image
Figure 4: DirectShow when painting on the Webcam video or a still picture

This adds a lot more components. Here are the components and what they do:

  • A WM Asf Writer that encodes the movie and writes it to a file. It needs both an audio and a video input, as well as a special profile to support encoding at different sizes.
  • Three Sample Grabbers. The first two delegates are the FlipVideo and LightPaint objects discussed earlier. The third has an Overlay delegate that puts the pen stroke onto the video preview so you can see where your hand and pen are while you paint.
  • A Smart Tee splits an image stream into two copies. There are two tees used. The first splits the camera video into streams with and without the still background image. The second splits the video into two screens: one that is recorded, and another that is seen on the preview. As a bonus, the Smart Tee gives priority to the movie encoder and can drop frames to the preview.
  • Microphone. The WM Asf Writer requires an audio source. The microphone gives you a chance to narrate something fun onto the movie.
  • Two Video Renderers. One provides previews of the pen over a still background image; the other goes over the webcam video to show the user is painting. I found that I needed to see where the pen is in the camera's view, and where it is on the resulting movie, in order to paint effectively.

The Overlay Delegate

The overlay delegate is a much simpler version of the LightPaint class. Like the LightPaint class, it flips the video (if appropriate) and puts the pen stroke onto the video preview so you can see where your hand and pen are while painting. It's synchronized with the LightPaint object in order to grab the pen strokes from it.

C#

public int BufferCB(double SampleTime, 
        IntPtr Buffer, int BufferLen)
{
    unsafe
    {
        // This scary construct speeds up the 
        // processing of the buffer a lot by 10ms or more.  
        // This is critical in speeding up access: each frame
        // has to make it thru the whole DS graph 
        // in far less than 30ms.
        fixed (byte* CurrentPoint = SrcPoints.CurrentPoints)
        fixed (byte* IsPenPoint   = SrcPoints.IsPoint)
        {
            byte* Buf = (byte*) Buffer;
            byte* End = Buf + BufferLen-(BufferLen%3);

            // The width of the LightPaint delegate and it's size
            int Width2 = SrcPoints.Size.Width;

            // The width of our buffer
            int Width  = Size.Width;

            if (Size == SrcPoints.Size)
            {
                // This is for the common, but special, 
                // case where the LightPaint
                // delegate and use have the size and 
                // we don't need to resize
                for (int I=0; Buf != End; Buf+=3, I++)
                    if (0 != IsPenPoint[I])
                    {
                       int J = I*3;
                       Buf[0] = CurrentPoint[J++];
                       Buf[1] = CurrentPoint[J++];
                       Buf[2] = CurrentPoint[J++];
                    }
            }
            else
            {
                // The loop used to scan over the points
                // Note: This is designed to allow different 
                // sizes for the LightPaint delegate and 
                // the buffer we are painting on
                for (int Y = 0, Y2=0; Buf != End; Y2+= dY2, Y=(Y2>>10))
                    for (int X=0,J=Y*Width2,I=J*3,I2=Y*Width2*1024;
                        Buf != End && X < Width;
                        X++, Buf+=3, I2+= dX2, J=(I2>>10),I=3*J)
                    {
                        if (0 != IsPenPoint[J])
                        {
                            Buf[0] = CurrentPoint[I];
                            Buf[1] = CurrentPoint[I+1];
                            Buf[2] = CurrentPoint[I+2];
                        }
                    }
            }
        }
    }
    return 0;
}

The Graph to Paint on a Movie

We can take this a step further by painting onto a movie. This means we need two video sources: one for the camera (which can see the pen light), and one for the movie.

image
Figure 5: DirectShow when painting on a movie, using your own microphone or the movie's audio

There are two separate filter graphs here. The top graph captures the pen strokes from the camera. The lower graph captures video from a video file, overlays the pen strokes, and previews and encodes the video. The code selects one of the dashed lines at run-time. If the user chooses to use the movie's original audio track, the WM Asf Writer's audio input is connected to the movie's output. Otherwise, the input is connected to the microphone.

We've also introduce two more components:

  • The WM Asf Reader, which reads from a video file
  • The Color Convert, which converts the video to RGB format. The video cameras allow a variety of output formats, and the delegate selects the RGB format. However, movie files tend to have only one output encoding, and it's seldom RGB.

Conclusion

In this article, I described how to create a paint-with-light effect on pictures, movies, etc. using a webcam. I reviewed the key concepts of DirectShow and how they can be used to create the video. Watching a movie of a sketch being drawn is what makes this different from a simple doodle on a picture.

If you want to try this out, check the download link for the source code at the top of the article!

About The Author

Randall Maas writes firmware for medical devices, and consults in embedded firmware. Before that, he did a lot of other things… like everyone else in the software industry. You can contact him at randym@acm.org.

Follow the Discussion

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.