Random Etc. Notes to self. Work, play, and the rest.

Help Me Think Through Multi-touch Events

We're lucky enough to have a big ass table in the office for a client project. Not a MS one, but an Ideum one that speaks the TUIO/OSC protocol.

I have a GestureWorks license available to me if I need it but I haven't evaluated that yet because I have a little time to experiment and I'm really just curious about how it all works. I'm using Flash, and I'm successfully receiving TUIO messages using udp-flashlc-bridge and tuio-as3 with a LocalConnection. It's working nicely so far, and I already adapted Modest Maps to deal with rotations and got up and running quite quickly.

The reason I'm reaching for my blog for the first time in a while to ask if anyone reading has experience to share about developing software for this kind of platform? Specifically I'm wondering if anyone has hints about how to go from raw cursor information (potentially from several fingers on several hands) and turn that into meaningful motion? What I can do so far works really well with two fingers but I'm wondering where to go from here.

Some of the steps seem simple. For example, if a cursor appears and disappears in a short space of time and doesn't move very much, that's a tap event. If two cursors appear and disappear twice in the same place in a short space of time, that's a double-tap event. If a single finger cursor appears and moves, that's a drag. A fast drag could be interpreted as a swipe. You can code some momentum/physics in there to make it feel right. So far so good. This much is intuitive.

So then... there's the whole task of doing hit-tests on each interactive object on screen so that you don't combine cursors into gestures unless they're acting on the same thing. This is familiar territory for me. But where intuition breaks down for me is if two or more fingers appear and move in roughly the same direction - is that a drag gesture? But how do you define roughly, or should you also interpret it as scaling or rotation? And when do you decide to ignore the scaling?

And then, if several fingers are doing this, potentially in opposite directions, how do you boil that down into a gesture? Or if several people are all mashing the table at once, how do you filter out the noise? Can you? There are some juicy interaction design problems in here that I know I'm not the first person to think about by a long shot (hat tip, hat tip, hat tip, hat tip, hat tip, hat tip etc). But I don't see much discussion of the interpretation of raw touch information into gestures. If you know of good projects that have solved these problems before (in an open way) I'd love to hear about them!

One idea I've had is to use a clustering algorithm like k-means to get two groups of points, and filter out the outliers, and then continue to apply slightly naive two-finger algorithms to build rotation/scale/translate matrices. Are you an old school multi-touch/reactivision hacker? Have you built touch tables for a living? Does all of this sound familiar to you? Let me know your thoughts!


2 Comments

There have been some other GestureWorks / Ideum projects with kids in museum settings. Can’t provide specifics off-hand, but the documentation for GestureWorks seems like they’ve handled a huge brunt of the load for gestures, but they have to be learned and performed. Persistent uniquely authored interactions in a N users setting is no joke, and I don’t think has anyone ’solved it’ yet.

Some ideas about eliminating noise when tracking finger-point clusters.

• Maintaining unique perceived users that belong to an event/interaction chain, as well as their interaction areas/blobs on the surface.

• Require an initiating gesture, somewhat of a mini calibration hidden by a swipe here in this way kind of message. Perhaps all fingers down, forearms ‘pointed’ together in front of the torso, simultaneous swipe out and away. This of course means two hands to use (discriminatory, but I guess multi-touch is inherently).

• Perhaps visually indicate a the softwares perceived interaction area associated to each user, a slightly lightened halo/cloud around where they are engaging. This could work or suck depending on the experience.

• Using the orientation of the thumb to denote left/right is obvious but then determining the clusters distance and respective rotation from the surface’s nearest edge (further away from the edge, the more likely the forearm is to be perpendicular to the edge), this could also be extended to guess at a torso location along the edge.

• Hand pairing by assessing the aforementioned concept to find a left hand and right hand which have similar torso locations, assume that one of the hands may disappear for a while and come back.

• A clear / reset gesture to forcibly end the UI session, perhaps the initiation gesture in reverse. If not a timeout prompt. Things to make your life easier by not leaving UX as a wholly ambiguous experience with no start and no end.

Posted by Chris Delbuck on 9 February 2010 @ 10pm

First question: are you expecting 1 person with 10 fingers, a few people or lots of people? The use case for each is vastly different.

When making this: http://pixelverse.org/portfolio/multitouchtable/ we expected (and got) lots of untrained users all using the table at once - so we only used single finger gestures. With 10 kids leaning on the table it was hopeless to try to pair fingers to users or even to the same hand.

In the physics toy we made, you can draw shapes, strike a line through things to delete them or touch a finger on an existing object we make a spring that pulls the object. The springs automatically give you multi-finger rotations by simply pulling on different points of an object at the same time. This physical analogy was easily discoverable and worked well without lots of gesture heuristics.

I also experimented with shape-based gestures and found that giving the user some feedback about which gesture they were in the middle of via simple colors, glows or symbols made it much easier for people to learn. Unfortunately I didn’t have time to fully explore this.

Hope that helps!

Posted by Josh on 10 February 2010 @ 10am