Saluting the Visual Gesture Builder - Details and example code


Peter Daukintis, Microsoft Technical Evangelist, who we recently highlighted, Kinect for Windows v2 Face Tracking Managed and Native, continues his Kinect for Windows v2 exploration and sharing, this time with two posts about the very cool Visual Gesture Builder.

Visual Gesture Builder – Kinect 4 Windows v2

When I created a previous project for a multi-user drum kit when it was time to code the gesture for a user hitting a drum I used a heuristic detector as it was to be used as a simple demo and it was the only quick option. By ‘heuristic detector’ I simply mean that as I tracked the position of each hand of a tracked skeleton I created some conditional code to detect whether the hand passed through the virtual drum – effectively, collision detection code in 3d space. It worked okay for my scenario but suffers from some issues:

  • The height of the hit point was fixed in 3d space

  • Everyone hits the drums differently 

    The latter could be fixed by extending the heuristic to adapt to the users skeleton height and arm reach, etc. but as more and more details are considered it is not hard to imagine the tests becoming complicated and generating many lines of code. Imagine for example, that you wanted to detect a military-style salute in your game/app, which criteria would you watch for in your heuristic? angle between wrist and shoulder? proximity of hand to head? It’s a bit of a thought exercise to imagine which criteria would give the best results and this is a fairly simple gesture.

    Kinect Skeletal tracking is powered by machine learning techniques as outlined here Real-Time Human Pose Recognition in Parts from a Single Depth Image and also explained in this video If you are completely new to machine learning then there are some full introductory courses available online here and here These techniques enable the Kinect system to identify parts of the human body and subsequently joint positions from the Kinect depth data in real time.

    Machine learning techniques can also be used to detect gestures. The Kinect for Windows team have exposed these techniques to allow you create your own gesture detection.

    Visual Gesture Builder

    Enter the Visual Gesture Builder (VGB) and its ability to facilitate machine learning techniques into your own gestures. The ML techniques utilise recorded and tagged data – the more data showing positive and negative behaviours in relation to your required gesture the better. One clip of recorded data can be enough to see a result but won’t work in real-world scenarios. In addition it is a common practice to split the data into a training set and also a set which can be used to verify that the trained system is working as expected. With those things in mind let’s take a look at how to record and tag data for use in the VGB.    

    If you haven’t played with Kinect Studio please refer to Kinect for Windows V2 SDK: Kinect Studio–the “Swiss Army Knife” over at Mike Taulty’s blog as this tool enables you to record clips required as data inputs to VGB. VGB uses skeleton data so you need to ensure that you record that stream into your clips. Once you have recorded some clips including your gestures you can import them into a VGB solution. So open VGB and create a new solution, to the solution you can add projects; one for each gesture you want to detect. This shows the solution structure:

    image ...

Project Information URL:

Visual Gesture Builder – Kinect 4 Windows v2 (code)

My previous post looked into my experimentation with using Visual Gesture Builder to create a gesture builder database (gbd) with a view to incorporating code into an application for detecting those trained gestures. I used the example of a military salute and will now describe the code for using this in a c#/xaml windows store app. I started by using some boilerplate code from a previous post which registers events to receive skeleton data (required for gesture detection) and colour data (to display the colour image from the RGB camera).

This snippet shows the initialisation code for importing the gesture database and registering to receive events when new data arrives.Note that I hold references to Gesture objects so that I can identify these when the data arrives.The two gestures I am interested in are the “salute” gesture which is a discrete gesture and “saluteProgress” which is a continuous gesture and will report progress for the salute. (Note. details of what, how and why for those different gesture types are in my previous post). This code follows the familiar pattern followed by the Kinect SDK of creating a frame source and opening a reader on that source which can be used to register for data events.Since we need a tracking id for gesture detection I have paused the reader until we have a valid one (this will be retrieved from the skeleton data, see code snippet below).


Project Information URL:

Project Source URL:

Further information

Contact Information:

The Discussion

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.