Using the Kinect to read Sign-Language
I wonder if the original Kinect project owners ever thought it would be used like this...
Sign language is the primary language for many deaf and hard-of-hearing people. But it currently is not possible for these people to interact with computers using their native language.
Because of this, researchers in recent years have spent lots of time studying the challenges of sign-language recognition, because not everyone understands sign language, and human sign-language interpreters are not always available. The researchers have examined the potential of input sensors such as data gloves or special cameras. The former provide good recognition results but are inconvenient to wear and have proven too expensive for mass use. And web cameras struggle to cope with issues such as tricky real-world backgrounds or illumination when not under controlled conditions that enable accurate hand tracking.
Then along came a device called the Kinect. Researchers from Microsoft Research Asia have collaborated with colleagues from the Institute of Computing Technology at the Chinese Academy of Sciences (CAS) to explore how Kinect’s body-tracking abilities can be applied to the problem of sign-language recognition. Results have been encouraging in enabling people whose primary language is sign language to interact more naturally with their computers, in much the same way that speech recognition does.
“From our point of view,” says CAS Professor Xilin Chen, “the most significant contribution is that the project demonstrates the possibility of sign-language recognition with readily available, low-cost 3-D and 2-D sensors.”
The work, facilitated and supported by Microsoft Research Connections, is summarized in the paper Sign Language Recognition and Translation with Kinect, co-authored by CAS researchers Xiujuan Chai, Guang Li, Yushun Lin, Zhihao Xu, Yili Tang, and Chen, along with Ming Zhou, principal researcher at Microsoft Research Asia.
Kinect, with its ability to provide depth information and color data simultaneously, makes it easier to track hand and body actions more accurately—and quickly.
The algorithm for this 3-D trajectory matching, in turn, has enabled the construction of a system for sign-language recognition and translation, consisting of two modes. The first, Translation Mode, translates sign language into text or speech. The technology currently supports American sign language but has potential for all varieties of sign language.
The second, Communications Mode, enables communications between a hearing person and a deaf or hard-of-hearing person by use of an avatar. Guided by text input from a keyboard, the avatar can display the corresponding sign-language sentence. The deaf or hard-of-hearing person responds using sign language, and the system converts that answer into text.
Does it work? Surprisingly well.
“One unique contribution of this project is that it is a joint effort between software researchers and the deaf and hard of hearing,” Zhou says. “A group of teachers and students from Beijing Union University joined this project, and this enabled our algorithms to be conducted on real-world data.”
Indeed, the collaboration between Microsoft Research and academia was central to the project.