Here's a nice overview of how touch events are modelled in mobile Safari on the iphone. Here's a student paper detailing some multi-touch code architectures and actual code too.
And here's a gesture recognition discussion thread featuring some of the above links and many more.
I have a GestureWorks license available to me if I need it but I haven't evaluated that yet because I have a little time to experiment and I'm really just curious about how it all works. I'm using Flash, and I'm successfully receiving TUIO messages using udp-flashlc-bridge and tuio-as3 with a LocalConnection. It's working nicely so far, and I already adapted Modest Maps to deal with rotations and got up and running quite quickly.
The reason I'm reaching for my blog for the first time in a while to ask if anyone reading has experience to share about developing software for this kind of platform? Specifically I'm wondering if anyone has hints about how to go from raw cursor information (potentially from several fingers on several hands) and turn that into meaningful motion? What I can do so far works really well with two fingers but I'm wondering where to go from here.
Some of the steps seem simple. For example, if a cursor appears and disappears in a short space of time and doesn't move very much, that's a tap event. If two cursors appear and disappear twice in the same place in a short space of time, that's a double-tap event. If a single finger cursor appears and moves, that's a drag. A fast drag could be interpreted as a swipe. You can code some momentum/physics in there to make it feel right. So far so good. This much is intuitive.
So then... there's the whole task of doing hit-tests on each interactive object on screen so that you don't combine cursors into gestures unless they're acting on the same thing. This is familiar territory for me. But where intuition breaks down for me is if two or more fingers appear and move in roughly the same direction - is that a drag gesture? But how do you define roughly, or should you also interpret it as scaling or rotation? And when do you decide to ignore the scaling?
And then, if several fingers are doing this, potentially in opposite directions, how do you boil that down into a gesture? Or if several people are all mashing the table at once, how do you filter out the noise? Can you? There are some juicy interaction design problems in here that I know I'm not the first person to think about by a long shot (hat tip, hat tip, hat tip, hat tip, hat tip, hat tip etc). But I don't see much discussion of the interpretation of raw touch information into gestures. If you know of good projects that have solved these problems before (in an open way) I'd love to hear about them!
One idea I've had is to use a clustering algorithm like k-means to get two groups of points, and filter out the outliers, and then continue to apply slightly naive two-finger algorithms to build rotation/scale/translate matrices. Are you an old school multi-touch/reactivision hacker? Have you built touch tables for a living? Does all of this sound familiar to you? Let me know your thoughts!