A Simple Gesture Processing Framework for the Kinect For Windows

Gesture handling is one of those things with the current Kinect for Windows SDK that we are all re-inventing. Until something is baked into the SDK, this is a ripe area for innovation and experimentation. Today's project is a good example of that...

Simple Gesture Processing using the Kinect for Windows

This sample will show you the basic components of a gesture processing system for the Microsoft Kinect for Windows.  Gesture processing is one of the most powerful abilities of an augmented reality application and the Kinect makes processing gestures straightforward.  This sample is very rudimentary, but it demonstrates the basic gesture processing pipeline that you would have to implement in a real world application.  In this case, the application is a simple Kinect controller for PowerPoint applications using keyboard shortcuts to control the PowerPoint application.  Along the way, it also demonstrates some other useful concepts, like how to draw a skeleton from the Kinect depth stream.  I hope you find this sample useful.  Please leave any feedback or questions that you may have.


This sample builds on the Beginning Kinect for Windows Programming sample.  If you have not looked at that sample AND you are not familiar with Kinect for Windows programming, I would recommend that you start there first before tackling this sample.

This sample assumes that you already have a Kinect test environment set up with all of the tools and technologies installed.  If you do not have that, the previous sample will also help you with that.  

When you are ready to proceed with this sample, open up the source code file that is attached and take a look at the gesture.cs file within the GestureFramework project.

This simple gesture framework is based upon the relationships of skeletal joints to each other over some period of time.  The enumeration JointRelationship describes the possible relationships of joints that are supported by the framework.  

The framework has two major class hierarchies.  One is a static description of the gestures, and the other is a really simple state engine that examines incoming skeletal information from the Kinect and tries to discern the relationships between joints that it is interested in using the static model.  The figure below shows the relationships between the major classes in the framework.


Here are the descriptions of the major classes in the hierarchy:

GestureComponent -Describes a single relationship between two joints.   The relationship contains only two joints and has two temporal parts; a beginning relationship and an ending relationship.

Gesture - Essentially this is a list of type GestureComponents with a unique identifier and a description.

GestureMap - Contains all of the utility functions to load a set of gestures from an XML file and manage their lifecycle.

GestureComponentState - Tracks the state of the relationships for a GestureComponent.

GestureState - Manages the state model for a set of GestureComponents and reports on the state.  This class in our application also maps the keycode for the PowerPoint application to the gesture model, although this is probably sub-optimal design.

GestureMapState - Rolls up the current state of all gestures and is the primary interface of the application into the gesture state model.

If you look through the above classes, you will see that there is absolutely no magic or rocket science inside of them.  The gesture recognition that they provide is very primitive, but they provide a platform onto which you could build a much more sophisticated model if you desire.

Project Information URL: http://code.msdn.microsoft.com/Simple-Gesture-Processing-097c5527

Project Download URL: http://code.msdn.microsoft.com/Simple-Gesture-Processing-097c5527

Project Source URL: http://code.msdn.microsoft.com/Simple-Gesture-Processing-097c5527

void SensorSkeletonFrameReady(AllFramesReadyEventArgs e) 
    using (SkeletonFrame skeletonFrameData = e.OpenSkeletonFrame()) 
        if (skeletonFrameData == null) 

        var allSkeletons = new Skeleton[skeletonFrameData.SkeletonArrayLength]; 

        foreach (Skeleton sd in allSkeletons) 
            // If this skeleton is no longer being tracked, skip it 
            if (sd.TrackingState != SkeletonTrackingState.Tracked) 

            // If there is not already a gesture state map for this skeleton, then create one 
            if (!_gestureMaps.ContainsKey(sd.TrackingId)) 
                var mapstate = new GestureMapState(_gestureMap); 
                _gestureMaps.Add(sd.TrackingId, mapstate); 

                var keycode = _gestureMaps[sd.TrackingId].Evaluate(sd, false, _bitmap.Width, _bitmap.Height); 

                if (keycode != VirtualKeyCode.NONAME) 
                    rtbMessages.AppendText("Gesture accepted from player " + sd.TrackingId + "\r"); 
                    rtbMessages.AppendText("Command passed to System: " + keycode + "\r"); 

            PlayerId = sd.TrackingId; 

            if (_bitmap != null) 
                _bitmap = AddSkeletonToDepthBitmap(sd, _bitmap, false); 




Follow the Discussion

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.