Kinect can be the bridge between folks who don’t speak the same language – and even those who can hear and those who can’t. A collaboration between Microsoft Research Asia, the Chinese Academy of Sciences and Beijing Union University has created a prototype that translates sign language into spoken language— and spoken language into sign language — in real time.
“There are more than 20 million people in China who are hard of hearing, and an estimated 360 million such people around the world, so this project has immense potential to generate positive social impact worldwide,” wrote Guobin Wu, program manager of the Kinect Sign Language Translator project for Microsoft Research Asia, on the Microsoft Research Connections Blog.
The system captures a conversation from both sides: the person who is deaf signs, with a written and spoken translation rendered in real-time, while the system takes a speaking person’s words and turns them into accurate, understandable signs. As you can see from the video, Dandan Yin, a Chinese 22-year-old computer science student born deaf, shows how this works by gesturing to a Kinect device connected to a sign language prototype – and words appear on the screen that translate what she just signed.
To see it is to be wowed by it. And for Yin, it’s the beginning of a childhood dream come true.
And thanks to this collaboration, her dream is coming true. As Tansley says, “When Dandan is in front of it and it recognizes her gestures, you can immediately see how it can work. As we develop the research further, it could be a viable solution.”
But for now, this system is a research prototype. As Wu writes on the Microsoft Research Connections Blog, “We are diligently working to overcome the technology hurdles so that the system can reliably understand and interpret in communication mode.”
Those hurdles aren’t insurmountable, but they are daunting. For instance, it takes five people to establish the recognition patterns for just one word. And so far, they’ve added 300 Chinese sign language words – out of 4,000. They’ve done it in a very compressed amount of time – just over a year, starting in spring 2012, as one of three finalists when the call went out to Microsoft Research labs around the world to submit their best Kinect collaborations with the academic world. The other two finalists have projects that focus on assistive technology for the blind; and advancing Kinect for Windows 3D scanning through Kinect Fusion.
I consider myself incredibly lucky to be the program manager of the Kinect Sign Language Translator project. There are more than 20 million people in China who are hard of hearing, and an estimated 360 million such people around the world, so this project has immense potential to generate positive social impact worldwide
During the first six months, we focused mainly on Chinese sign language data collection and labeling. Prof. Chen’s team worked closely with Prof. Hanjing Li of the special education school at Beijing Union University. The first step was to recruit two or three of Prof. Li’s students who are deaf to be part of the project. One candidate in particular stood out: Dandan Yin. We were moved when, during the interview, she told us, “When I was a child, my dream was to create a machine to help people who can’t hear.”
The next milestone was to build a sign language recognition system. The team has published many papers that explain the technical details, but what I want to stress here is the collaborative nature of the project. Every month, we had a team meeting to review the progress and plan our next steps. Experts from a host of disciplines—language modeling, translation, computer vision, speech recognition, 3D modeling, and special education—contributed to the system design.
Our system is still a research prototype. It is progressing from recognizing isolated words signed by a specific person (translator mode) to understanding continuous communication from any competent signer (communication mode). Our current prototype can successfully produce good results for translator mode, and we are diligently working to overcome the technology hurdles so that the system can reliably understand and interpret in communication mode. And while we’re solving those challenges, we are also starting to build up the system’s vocabulary of American Sign Language gestures, which are different from those of Chinese Sign Language.
We’ve had the good fortune to demo the system at both the Microsoft Research Faculty Summit and the Microsoft company meeting this year. Dandan attended both events and displayed her professionalism as a signer. After the Faculty Summit in July, she emotionally thanked Microsoft for turning her dream into reality. I was nearly moved to tears by our reception during the company meeting, the first one that I’d ever attended in person. And I was thrilled to hear thundering applause when Dandan communicated with a hearing employee by using our system.
Since these demos, the project has received much attention from researchers and the deaf community, especially in the United States. We expect that more and more researchers from different disciplines and different countries will collaboratively build on the prototype, so that the Kinect Sign Language Translator system will ultimately benefit the global community of those who are deaf or hard of hearing. The sign language project is a great example of selecting the right technical project with the right innovative partners, and applying effort and perseverance over the years. It has been a wonderful, multidisciplinary, collaborative effort, and I’m honored and proud to be involved.
When Microsoft Research shipped the first official Kinect for Windows software development kit (SDK) beta in June 2011, it was both an ending and a beginning for me. The thrilling accomplishment of rapidly and successfully designing and engineering the SDK was behind us, but now the development and supporting teams had returned to their normal research work, and I was left to consider how best to showcase the research potential of Kinect technology beyond gaming.
Since Kinect’s launch in November 2010, investigators from all quarters had been experimenting with the system in imaginative and diverse applications. There was very little chance of devising some stand-out new application that no one had thought of—since so many ideas were already in play. So I decided to find the best of the current projects and “double down” on them.
But rather than issuing a public global call—which we didn’t do, because so many people were proactively experimenting with Kinect technology—we turned to the Microsoft Research labs around the world and asked them to submit their best Kinect collaborations with the academic world, thus bringing together professors and our best researchers, as we normally do in Microsoft Research Connections.
Nine months later, in July 2013, we were excited to host Dandan at the annual Microsoft Research Faculty Summit in Redmond—her first trip outside China. We were thrilled with the response by people both attending and watching the Summit. The sign language translator and Dandan made the front page of the Seattle Times and were widely covered by Internet news sites.
We knew we had to make a full video of the system to share it with others and take the work further. Over a couple of sweltering days in late July (yes, Seattle does get hot sunny days!), we showed the system to Microsoft employees. It continued to capture the imagination, including that of Microsoft employees who are deaf.
We got the chance to demonstrate the system at the Microsoft annual company meeting in September 2013—center stage, with 18,000 in-person attendees and more than 60,000 watching online worldwide. This allowed us to bring Dandan and the Chinese research team back to Seattle, and it gave us the opportunity to complete our video.
That week, we all went back into the studio, and through a long hard day, shot the remaining pieces of the story, explaining how the system could one day transform the lives of millions of people who are deaf or hard or hearing—and all of us—around the world.