Microsoft Research has unveiled new details of its work on advanced gestural user interfaces, and while the technology remains in the lab, Redmond seems to have made serious progress. One of Microsoft's next-gen gestural UIs is called Handpose, as Fast Company reports:

Almost 25 years ago, researcher Xuedong Huang founded the speech recognition program at Microsoft Research (MSR). His groundbreaking work ended up in Microsoft’s products like Cortana and Kinect. Today, voice recognition is pretty much figured out. But while computers hear us well, they still don’t seeus very well. Gestural interfaces are still rudimentary. We may have virtual reality at home—and yet, those systems can’t even make out our own hands.

That may change soon, as Huang says a "paradigm shift" is happening within Microsoft Research. In a newly released demo of its gesture platform, Handpose, the company is revealing an unprecedentedly accurate hand tracking system that requires so little processing power that it could scale from computers to tablets to VR headsets.

Voice recognition systems have become fairly advanced and reliable in recent years. That's because the technology evolved from using templates for an entire word, which computers would then match up with your utterance, to phonemes—the building blocks of words, the Fast Company report notes. Handpose is following the same idea:

Most gesture systems, including Microsoft Kinect, still use this simple style of template matching. But Handpose, MSR's new gesture recognition system, abandons those templates completely. Instead, it incorporates what it's calling a "gesture vocabulary." The system looks at your hand and, instead of seeing it as a whole blob that needs to match to something preprogrammed in the system, it breaks up your hand into independent pieces—so it can reason how chunks of your fingers and knuckles curl into a fist. "Those core elements are almost like a phoneme for a pronunciation of a word," says Huang.

Suddenly, a vision system like Kinect, which can currently only recognize large sweeps of your hand using a broad image-matching technique, could use these finger phonemes to identify fine motor controls like grasping tiny objects or touch-typing on a holographic QWERTY keyboard floating in midair.

Microsoft Research is presenting papers on the new technology at a pair of academic conferences this year. While it's not clear when Handpose will become part of Microsoft's commercial offerings, a demonstration video the company released suggests it's coming along nicely. 

An in-depth breakdown of Handpose's components and goals is available on Microsoft's Next blog. That post also discusses another gestural UI effort underway at Microsoft Research, which is dubbed Project Prague:

Let’s say you’re talking to a colleague over Skype and you’re ready to end the call. What if, instead of using your mouse or keyboard to click a button, you could simply make the movement of hanging up the phone?

Need to lock your computer screen quickly? What if, instead of scrambling to close windows and hit keyboard shortcuts, you simply reach out and mimic the gesture of turning a key in a lock?

Adi Diamant, who directs the Advanced Technologies Lab, said that when people think about hand and gesture recognition, they often think about ways it can be used for gaming or entertainment. But he also sees great potential for using gesture for everyday work tasks, like designing and giving presentations, flipping through spreadsheets, editing e-mails and browsing the web.

It's not difficult to see how Handpose and Project Prague's capabilities could augment not only Microsoft's consumer products, but also the enterprise, through its Office and Dynamics applications suites. Many existing enterprise applications have by now gained touchscreen UIs, but touch has been an imperfect advancement, says Constellation Research VP and principal analyst Alan Lepofsky. Gestural UI, on the other hand, has massive potential.

"People often talk about touch gestures on smartphones and tablets tablets as being intuitive, but really they are not," Lepofsky says. "Have you ever actually pinched a photo to make it smaller? Natural gestures in three dimensions have the potential to improve the way people create and share information, and meet and connect with colleagues and customers. Imagine actually handing a file over to someone, or shaking their hand when a virtual meeting begins."

24/7 Access to Constellation Insights
If you’d like unrestricted access to Constellation Insights, consider joining the Constellation Executive Network for analyst advice and analyses that you can use.