Random Etc. Notes to self. Work, play, and the rest.

Archive for February 2005

Journal for Patterns Recognised

I'm (re)writing a literature review at the moment (ostensibly for the first chapter of my thesis), and supposedly I'm writing a book chapter in the next two weeks too.

So, in the spirit of structured procrastination, I've spent the last half an hour thinking about socialfiction.org's Journal for Patterns Recognised. Herewith some notes for an article which shouldn't get written, but about which I welcome criticism and/or encouragement.

On What It Means To Spot A Pattern

Teleological implications aside, are patterns things which want to be found? If it isn't found, is it a pattern? If it can't be found, is it a pattern?

In finding a pattern, we become familiar with it and its medium (carrier?). Is pattern-ness defined by the process of becoming familiar? Can we become familiar with a pattern-less medium? (And would that familiarity be due to anything other than repetition - another manifestation of pattern?). Is a pattern a collection of similar landmarks?

In "The Pattern On The Stone", Daniel Hillis talks about randomness, information content and entropy (I don't recall if he uses these terms). Does a random image contain more "information" than an image of a face? (Why does it take more bits to store it? Should we think about how to generate it? Is one random thing the same as another, supposing no patterns have been identified which render it non-random?)

Are patterns correspondences? Similarities? Matches? Anything we recognise? Must patterns be regular (in space or time?)

Does recognition mean implication, or causation? (cf. Gladwell's Blink - does correlation imply causation whether we want it to or not?)

Do Christopher Alexander's design patterns or the Gang of Four's analogous object-oriented design patterns work in the same way as knitting patterns? Is a pattern a framework from which we can hang information?

So we have patterns in time - repetitions, echoes and (I suppose) resonance.

So we have patterns as best practice (design/formula), a way of working which we've done before, a record of success or failure (anti-patterns?).

Generative grammars, such as languages. Do they encapsulate, generate, define or represent patterns?

Are we hard-wired for pattern recognition? Are creatures in general? (Zebra Patterns vs Long Grass and mono vision... Fly eyes... Sawipnpg lteters in the mddlie of wdros... turning mouths upside-down on upside-down images... Scott Kim's typographic inversions... Tom Coates' We See Faces In Audio Equipment... moths with eyes on wings... Eddie Izzard's evil pilot fish headlights prank... what does gestalt psychology have to say about all of this?)

Simulation for the Social Scientist

I'm reading Simulation for the Social Scientist. Notes to follow.

Half-baked thoughts on Social Network Visualisation for Flickr

Sites like Flickr (Friendster, Orkut, Tribe, MySpace, etc.) are collecting masses of data about people. Person A knows Person B, B knows C and D, D knows E, who knows A, and so on. If you're into your discrete maths (or your systems thinking), you'll know that the formalisation of these kinds of connections is called Graph Theory. If not, then bear in mind that in graph theory a Graph is composed of Nodes (or points) and Edges (or connections). If A,B,C,D,E are nodes, then "A knows B" is an edge between A and B. (Or if Flickr is a pair of shoes, A,B,C,D,E are the eyelets, and "A knows B" is a shoelace.)

My first example can be drawn in 2D like so:

Plotting these relationships in 2D, or 3D space is known as graph embedding, and finding useful ways to interrogate them is an intriguing problem to think about. Indeed, it's a research field in its own right. Graphs are well understood data structures and many tools are available to manipulate and analyse them (e.g. GraphViz). One reason for this is that they are the cornerstone of many computer science problems (and solutions), but also because they can be used across many different disciplines. The ubiquitous application of graph theoretical methods across many scientific disciplines is the main subject of recent popular science books such as Linked and Ubiquity.

Social Network Visualisation tends to borrow heavily, if not totally, from graph theory. Having looked at several examples of this over the past year, I've spotted some common pitfalls which I'll try and articulate here.

Graph Visualisation Pitfalls

On Flickr, we can easily find out if A knows B by looking to see if A has listed B as a contact. But listing of contacts is tied up with all sorts of other practical considerations. The main reason A adds B as a contact isn't so that we can use that data for social network analysis, unfortunately. On Flickr, it says A knows B, or A likes B's photos, or A chatted to B in the forums and might want to find them again, or B added A (for any of these reasons) so A was being polite and reciprocated, and so on. Not all connections are made equal.

In reality A might have anywhere from 0 to 500 contacts. As the average number of contacts creeps upwards, the naive attempt to draw the graph falls over. My ASCII art attempt would be screwed as soon as there was a group of five mutual acquaintences. Even with the assistance of mature software like GraphViz, the network of X knows Y is too dense to draw clearly. The not-a-tree problem is faced by many network visualisations. There are probably too many connections to graph.

For many of us, it's intuitive to visualise these relationships as a tree. This cluster is connected to that cluster, and people are either in one cluster or another. Clusters probably have sub-clusters. We can handle these kind of relationships easily, but unfortunately they fall down straight away when presented with real data. For example, grouping people by handed-ness is a trivial example of something which generates overlapping sets. Imagine A and B are left-handed, D and E are right-handed, but C is ambidextrous. We aren't dealing with heirarchical clusters, we're dealing with overlapping sets.

A knows B, B knows C, C knows D. What's the connection between A and D? If we're analysing a terrorist network, then we might have found a potential link which is worth investigating further. But if we're trying to recommend photography or music or web-links to A, should we include D's tastes? If D is C's drug dealer, and B is C's little sister, and A is C's elderly next door neighbour? Probably not. Connections aren't transitive.

Show Me The Eye Candy Anyway

So assume we're aware of the above caveats, and we have a densely but ambiguously connected graph of contacts. The main task with this kind of visualisation, once you have the data, is how to display it in a readable format. It's almost certainly too much data to just throw out there (but it's always worth a try), so how do we prune it down to show only the meaningful stuff?

On Flickr, GustavoG has been busy producing graphs of the mutual contacts and testimonials networks. You can see them all in his FlickrLand set. These are interesting to examine for many reasons, not least in how he has pruned the network in order to get manageable visualisations from it. The whole social network on Flickr would be too big to show in detail, so Gustavo doesn't show it all. He rightly spots that testimonials should indicate fairly strong ties, and the network is much sparser than the contacts network. He's also attempts to trim the constacts network down by set the requirement that contacts must be mutual for a graph edge to exist, and he's tried different thresholds for how many mutual contacts a person must have before they are added to the graph.

Gustavo has used yEd, a Java graph layout program, to produce graphs using the "organic layout", and it works pretty well. In particular, in certain graphs there are undeniably meaningful clusters to be found, ones that expert users can spot straight away. At 50 mutual contacts and 100 mutual contacts in particular, the clusters are pronounced. At 10 mutual contacts the network is too dense to be very meaningful - certainly as it's presented at the moment it doesn't say much at all. At 200 mutual contacts, it tells us what many regulars to Flickr already know - there are only a few very well connected folks, and they mostly know each other. Because testimonials indicate stronger ties, the overall network is very fragmented, leaving many loose mini-clusters. Nevertheless, the overall picture is surprisingly well knitted together.


Gustavo's method for making the data manageable is to remove nodes which aren't significant in the overall picture. For instance, in mapping out clusters of users on Flickr, it's probably not a big deal to lose people with less than 10 mutual contacts. Marcos Weskamp recently used a different tactic to cut down the same data - remove edges, and only show a subset (window) of nodes at any one time.

Marcos's FlickrGraph is fantastic to play with - the interactivity and clean design help there - but unfortunately it is less meaningful than Gustavo's static graphs.

The FlickGraph suffers from the data source because the current Flickr API will always return contacts sorted by user-id (an essentially arbitrary number). Because of this, and the limit of ten contacts shown, a lot of popular people won't appear to be connected to anyone despite the fact that they are.

Ways To Improve These Visualisations

Wanted: Richer Data.

We already know how to get meaningful visualisations from our data. We have to start with meaningful data. Marcos Weskamp has demonstrated a neat way of graphing mailing list interactions with his Social Circles project. In my opinion, the FlickrGraph lacks some of the insight that Social Circles provides. This is partly due to technical implementation, but mainly because Social Circles is sourced from real interactions and implicit connections, not watered-down explicit declarations of interest. Flickr users like HyperBob are very active in the forums and comments, but don't keep contacts on Flickr. Social Circles-style data would capture that.

On Flickr, there are several implicit contacts networks we could use:

The idea here is that we should be visualising the actual social things people are doing, rather than the social acts they say they are doing. I also touched upon these ideas in a post about Architects, Social Networks and Hypertext. But enough of my ranting, I'm off to try one of these methods.

Many Well-Meaning Workers Making Web Machines Work Marvellously Well

Martin Wattenburg and Marek Walczak, Marius Watz, Marcos Weskamp, Matt Webb, Matt Ward and Matt Wade.

STUPID new sticker at the station…

STUPID new stickers at the station...
STUPID new stickers at the station...
Originally uploaded by Just_Tom.

Don Norman would be turning in his grave if he was dead.

PLAN | ICA, Day 2

My notes from day 2 at PLAN are now online.

Some thoughts arising from PLAN, and elsewhere

Play as a topic is in vogue, for sure.

At some point I was wondering about the role of iterative development cycles, frequent releases, rapid prototyping and so on as allowing for more play in design processes. Taking software development as an example, the waterfall model of development (analyse, specify, design, code, test) only really has one opportunity for play, whereas recent movements like Agile Programming afford a little bit more experimentation.

I was also wondering about the relationship between play and flow. Is one necessary, or sufficient, for the other?

Performance was a big part of a lot of projects that were talked about at PLAN. I'm stuck on this one because I can't help thinking of performance as attention-seeking, or performance as display. This one requires more reading, I think.

There was a strong distinction made between inter-disciplinary teams and trans-disciplinary teams (with cross-disciplinarity - if that is a word - being somewhere between the two). It was agreed that the former inevitably results in siloed development teams, and that the latter requires a shared vocabulary to facilitate understanding and common goals. As someone who instinctively falls into roles bridging the art-science divide (as I'm sure many people do) I didn't realise it would be as big an issue in this kind of forum. (In school, and college, and university, I never really did one discipline, and I don't consider myself wholly a part of either the computer science or architecture & design communites in my post-graduate studies either).

There was a call for raising awareness of the potential applications of locative media amongst those disiciplines which weren't represented at PLAN. One such discipline which came to my mind was oral history. A large number of the projects discussed involved using locative media to enhance the delivery of place-specific story-telling and to restore the context in historical re-enactments (psycho-histories?). In the bar, Giles Lane mentioned to me some of the issues around public collections of oral history recordings - namely that as personal first-hand accounts they aren't always cleared for public consumption. Still, I think it would be good to get the people making these recordings to be aware of the possibilities for presenting them in context.

My final thought was about the role of locative media practitioners as tool makers, in the same sense that most computer scientists would consider themselves tool makers. (This notion gained some ground in the UK in recent times, I think, and perhaps contributed to the increased emphasis on inter- (and trans-) disciplinary research now going on in UK CS departments).

PLAN | ICA, Day 1

I've just finished a rough transcript of my notes from day 1 of this week's PLAN workshop.

Molly has better notes at social beasts, and Nicolas had already collected some of the take-home messages at Pasta and Vinegar.

Think Pink

(a 3D revamp of earlier 2D particle/attractor code)