Project Lily and Context-Aware Dialogue with Kinect


Today's project focuses on the a part of Kinect development that might not be a awe inspiring as augmented reality, games, 3D modeling, NUI's, etc, but in the end is one of the killer features of the Kinect, speech recognition and using it to add communication capabilities to your projects...

Context-Aware Dialogue with Kinect

Meet Lily, my office assistant. We converse often, and at my direction Lily performs common business tasks such as looking up information and working with Microsoft Office documents. But more important, Lily is a virtual office assistant, a Microsoft Kinect-enabled Windows Presentation Foundation (WPF) application that’s part of a project to advance the means of context-aware dialogue and multimodal communication.

Before I get into the nuts-and-bolts code of my app—which I developed as part of my graduate work at George Mason University—I’ll explain what I mean by context-aware dialogue and multimodal communication.

Context-Aware Dialogue and Multimodal Communication

As human beings, we have rich and complex means of communicating. Consider the following scenario: A baby begins crying. When the infant notices his mother is looking, he points at a cookie lying on the floor. The mother smiles in that sympathetic way mothers have, bends over, picks up the cookie and returns it to the baby. Delighted at the return of the treasure, the baby squeals and gives a quick clap of its hands before greedily grabbing the cookie.

This scene describes a simple sequence of events. But take a closer look. Examine the modes of communication that took place. Consider implementing a software system where either the baby or the mother is removed and the communication is facilitated by the system. You can quickly realize just how complex and complicated the communication methods employed by the two actors really are. There’s audio processing in understanding the baby’s cry, squeal of joy and the sound of the clap of hands. There’s the visual analysis required to comprehend the gestures repre­sented by the baby pointing at the cookie, as well as inferring the mild reproach of the mother by giving the sympathetic smile. As often is the case with actions as ubiquitous as these, we take for granted the level of sophistication employed until we have to implement that same level of experience through a machine.

Let’s add a little complexity to the methods of communication. Consider the following scenario. You walk into a room where several people are in the middle of a conversation. You hear a single word: “cool.” The others in the room look to you to contribute. What could you offer? Cool can mean a great many things. For example, the person might have been discussing the temperature of the room. The speaker might have been exhibiting approval of something (“that car is cool”). The person could have been discussing the relations between countries (“negotiations are beginning to cool”). Without the benefit of the context surrounding that single word, one stands little chance of understanding the meaning of the word at the point that it’s uttered. There has to be some level of semantic understanding in order to comprehend the intended meaning. This concept is at the core of this article.

Project Lily

I created Project Lily as the final project for CS895: Software for Context-Aware Multiuser Systems at George Mason University, taught by Dr. João Pedro Sousa. As mentioned, Lily is a virtual assistant placed in a typical business office setting. I used the Kinect device and the Kinect for Windows SDK beta 2. Kinect provides a color camera, a depth-sensing camera, an array of four microphones and a convenient API that can be used to create natural UIs. Also, the Microsoft Kinect for Windows site ( and Channel 9 ( provide a plethora of useful, related examples. Kinect has brought incredible capabilities to developers in a (relatively) inexpensive package. This is demonstrated by Kinect breaking the Guinness World Records “fastest selling consumer device” record ( The Kinect technical specifications (documented at include:

Project Information URL:

Project Source URL: