Random Etc. Notes to self. Work, play, and the rest.

Posts Tagged ‘Thoughts’

On Oculo- and Osteo-validation Methods

Eyeball it, and feel it in your bones.

An Observation

Many Londoners, including me, find the fact that there are plain-clothed armed police on the tube scarier than the fact that there are suicide bombers on the tube.

There is a nervous optimism that the police had a good reason to shoot a man five times, after they had restrained him. They must have had a good reason, mustn't they?

Note to the Standard, and others: Two days of inconvenience does not a Blitz make. And stop calling it 7/7, please.

Journal for Patterns Recognised

I'm (re)writing a literature review at the moment (ostensibly for the first chapter of my thesis), and supposedly I'm writing a book chapter in the next two weeks too.

So, in the spirit of structured procrastination, I've spent the last half an hour thinking about socialfiction.org's Journal for Patterns Recognised. Herewith some notes for an article which shouldn't get written, but about which I welcome criticism and/or encouragement.

On What It Means To Spot A Pattern

Teleological implications aside, are patterns things which want to be found? If it isn't found, is it a pattern? If it can't be found, is it a pattern?

In finding a pattern, we become familiar with it and its medium (carrier?). Is pattern-ness defined by the process of becoming familiar? Can we become familiar with a pattern-less medium? (And would that familiarity be due to anything other than repetition - another manifestation of pattern?). Is a pattern a collection of similar landmarks?

In "The Pattern On The Stone", Daniel Hillis talks about randomness, information content and entropy (I don't recall if he uses these terms). Does a random image contain more "information" than an image of a face? (Why does it take more bits to store it? Should we think about how to generate it? Is one random thing the same as another, supposing no patterns have been identified which render it non-random?)

Are patterns correspondences? Similarities? Matches? Anything we recognise? Must patterns be regular (in space or time?)

Does recognition mean implication, or causation? (cf. Gladwell's Blink - does correlation imply causation whether we want it to or not?)

Do Christopher Alexander's design patterns or the Gang of Four's analogous object-oriented design patterns work in the same way as knitting patterns? Is a pattern a framework from which we can hang information?

So we have patterns in time - repetitions, echoes and (I suppose) resonance.

So we have patterns as best practice (design/formula), a way of working which we've done before, a record of success or failure (anti-patterns?).

Generative grammars, such as languages. Do they encapsulate, generate, define or represent patterns?

Are we hard-wired for pattern recognition? Are creatures in general? (Zebra Patterns vs Long Grass and mono vision... Fly eyes... Sawipnpg lteters in the mddlie of wdros... turning mouths upside-down on upside-down images... Scott Kim's typographic inversions... Tom Coates' We See Faces In Audio Equipment... moths with eyes on wings... Eddie Izzard's evil pilot fish headlights prank... what does gestalt psychology have to say about all of this?)

Half-baked thoughts on Social Network Visualisation for Flickr

Sites like Flickr (Friendster, Orkut, Tribe, MySpace, etc.) are collecting masses of data about people. Person A knows Person B, B knows C and D, D knows E, who knows A, and so on. If you're into your discrete maths (or your systems thinking), you'll know that the formalisation of these kinds of connections is called Graph Theory. If not, then bear in mind that in graph theory a Graph is composed of Nodes (or points) and Edges (or connections). If A,B,C,D,E are nodes, then "A knows B" is an edge between A and B. (Or if Flickr is a pair of shoes, A,B,C,D,E are the eyelets, and "A knows B" is a shoelace.)

My first example can be drawn in 2D like so:

Plotting these relationships in 2D, or 3D space is known as graph embedding, and finding useful ways to interrogate them is an intriguing problem to think about. Indeed, it's a research field in its own right. Graphs are well understood data structures and many tools are available to manipulate and analyse them (e.g. GraphViz). One reason for this is that they are the cornerstone of many computer science problems (and solutions), but also because they can be used across many different disciplines. The ubiquitous application of graph theoretical methods across many scientific disciplines is the main subject of recent popular science books such as Linked and Ubiquity.

Social Network Visualisation tends to borrow heavily, if not totally, from graph theory. Having looked at several examples of this over the past year, I've spotted some common pitfalls which I'll try and articulate here.

Graph Visualisation Pitfalls

On Flickr, we can easily find out if A knows B by looking to see if A has listed B as a contact. But listing of contacts is tied up with all sorts of other practical considerations. The main reason A adds B as a contact isn't so that we can use that data for social network analysis, unfortunately. On Flickr, it says A knows B, or A likes B's photos, or A chatted to B in the forums and might want to find them again, or B added A (for any of these reasons) so A was being polite and reciprocated, and so on. Not all connections are made equal.

In reality A might have anywhere from 0 to 500 contacts. As the average number of contacts creeps upwards, the naive attempt to draw the graph falls over. My ASCII art attempt would be screwed as soon as there was a group of five mutual acquaintences. Even with the assistance of mature software like GraphViz, the network of X knows Y is too dense to draw clearly. The not-a-tree problem is faced by many network visualisations. There are probably too many connections to graph.

For many of us, it's intuitive to visualise these relationships as a tree. This cluster is connected to that cluster, and people are either in one cluster or another. Clusters probably have sub-clusters. We can handle these kind of relationships easily, but unfortunately they fall down straight away when presented with real data. For example, grouping people by handed-ness is a trivial example of something which generates overlapping sets. Imagine A and B are left-handed, D and E are right-handed, but C is ambidextrous. We aren't dealing with heirarchical clusters, we're dealing with overlapping sets.

A knows B, B knows C, C knows D. What's the connection between A and D? If we're analysing a terrorist network, then we might have found a potential link which is worth investigating further. But if we're trying to recommend photography or music or web-links to A, should we include D's tastes? If D is C's drug dealer, and B is C's little sister, and A is C's elderly next door neighbour? Probably not. Connections aren't transitive.

Show Me The Eye Candy Anyway

So assume we're aware of the above caveats, and we have a densely but ambiguously connected graph of contacts. The main task with this kind of visualisation, once you have the data, is how to display it in a readable format. It's almost certainly too much data to just throw out there (but it's always worth a try), so how do we prune it down to show only the meaningful stuff?

On Flickr, GustavoG has been busy producing graphs of the mutual contacts and testimonials networks. You can see them all in his FlickrLand set. These are interesting to examine for many reasons, not least in how he has pruned the network in order to get manageable visualisations from it. The whole social network on Flickr would be too big to show in detail, so Gustavo doesn't show it all. He rightly spots that testimonials should indicate fairly strong ties, and the network is much sparser than the contacts network. He's also attempts to trim the constacts network down by set the requirement that contacts must be mutual for a graph edge to exist, and he's tried different thresholds for how many mutual contacts a person must have before they are added to the graph.

Gustavo has used yEd, a Java graph layout program, to produce graphs using the "organic layout", and it works pretty well. In particular, in certain graphs there are undeniably meaningful clusters to be found, ones that expert users can spot straight away. At 50 mutual contacts and 100 mutual contacts in particular, the clusters are pronounced. At 10 mutual contacts the network is too dense to be very meaningful - certainly as it's presented at the moment it doesn't say much at all. At 200 mutual contacts, it tells us what many regulars to Flickr already know - there are only a few very well connected folks, and they mostly know each other. Because testimonials indicate stronger ties, the overall network is very fragmented, leaving many loose mini-clusters. Nevertheless, the overall picture is surprisingly well knitted together.


Gustavo's method for making the data manageable is to remove nodes which aren't significant in the overall picture. For instance, in mapping out clusters of users on Flickr, it's probably not a big deal to lose people with less than 10 mutual contacts. Marcos Weskamp recently used a different tactic to cut down the same data - remove edges, and only show a subset (window) of nodes at any one time.

Marcos's FlickrGraph is fantastic to play with - the interactivity and clean design help there - but unfortunately it is less meaningful than Gustavo's static graphs.

The FlickGraph suffers from the data source because the current Flickr API will always return contacts sorted by user-id (an essentially arbitrary number). Because of this, and the limit of ten contacts shown, a lot of popular people won't appear to be connected to anyone despite the fact that they are.

Ways To Improve These Visualisations

Wanted: Richer Data.

We already know how to get meaningful visualisations from our data. We have to start with meaningful data. Marcos Weskamp has demonstrated a neat way of graphing mailing list interactions with his Social Circles project. In my opinion, the FlickrGraph lacks some of the insight that Social Circles provides. This is partly due to technical implementation, but mainly because Social Circles is sourced from real interactions and implicit connections, not watered-down explicit declarations of interest. Flickr users like HyperBob are very active in the forums and comments, but don't keep contacts on Flickr. Social Circles-style data would capture that.

On Flickr, there are several implicit contacts networks we could use:

The idea here is that we should be visualising the actual social things people are doing, rather than the social acts they say they are doing. I also touched upon these ideas in a post about Architects, Social Networks and Hypertext. But enough of my ranting, I'm off to try one of these methods.

Some thoughts arising from PLAN, and elsewhere

Play as a topic is in vogue, for sure.

At some point I was wondering about the role of iterative development cycles, frequent releases, rapid prototyping and so on as allowing for more play in design processes. Taking software development as an example, the waterfall model of development (analyse, specify, design, code, test) only really has one opportunity for play, whereas recent movements like Agile Programming afford a little bit more experimentation.

I was also wondering about the relationship between play and flow. Is one necessary, or sufficient, for the other?

Performance was a big part of a lot of projects that were talked about at PLAN. I'm stuck on this one because I can't help thinking of performance as attention-seeking, or performance as display. This one requires more reading, I think.

There was a strong distinction made between inter-disciplinary teams and trans-disciplinary teams (with cross-disciplinarity - if that is a word - being somewhere between the two). It was agreed that the former inevitably results in siloed development teams, and that the latter requires a shared vocabulary to facilitate understanding and common goals. As someone who instinctively falls into roles bridging the art-science divide (as I'm sure many people do) I didn't realise it would be as big an issue in this kind of forum. (In school, and college, and university, I never really did one discipline, and I don't consider myself wholly a part of either the computer science or architecture & design communites in my post-graduate studies either).

There was a call for raising awareness of the potential applications of locative media amongst those disiciplines which weren't represented at PLAN. One such discipline which came to my mind was oral history. A large number of the projects discussed involved using locative media to enhance the delivery of place-specific story-telling and to restore the context in historical re-enactments (psycho-histories?). In the bar, Giles Lane mentioned to me some of the issues around public collections of oral history recordings - namely that as personal first-hand accounts they aren't always cleared for public consumption. Still, I think it would be good to get the people making these recordings to be aware of the possibilities for presenting them in context.

My final thought was about the role of locative media practitioners as tool makers, in the same sense that most computer scientists would consider themselves tool makers. (This notion gained some ground in the UK in recent times, I think, and perhaps contributed to the increased emphasis on inter- (and trans-) disciplinary research now going on in UK CS departments).

Architects, Social Networks and Hypertext

I've recently re-read Christopher Alexander's essay A City is not a Tree (found via City Of Sound) as research for a lecture I'm giving which will tentatively be titled Turning Architects into Programmers (or perhaps less aggressively, Programming for Architects). The first time I read it, in relation to a social network visualisation project, it triggered a brief exchange with Alasdair Turner, a lecturer in architectural computing at UCL, who was lecturing me in Methods of Synthetic Construction 2 from the Bartlett's MSc Virtual Environments at the time.

As part of the course, students were asked to visualise the social networks existing between past and present course staff and students, with an emphasis on the alumni database which was becoming increasingly difficult for one person to carry around in their head. The alumni visualisation was difficult due to incomplete information, but because it has always been online sites like the Wayback Machine at archive.org helped a little. Questioning the staff from the course also helped, though it became apparent that not everyone agreed how long they had been involved - some people even got their own involvement wrong!. It's worth pointing out that effective and general purpose social network visualisation is incredibly rare (please correct me if I'm wrong).

A lot of the social network visualisations started out by assuming that the social network of the course was a tree, i.e. that each person was a 'child' of a year group. I also did this, but I did make an attempt to bridge the gaps between years by developing a way to show that faculty members and part-time students are members of more than one year group. This was designed to emphasise the continuity provided by the course team, and also to illustrate how weak the ties were between year groups.

Christopher Alexander's essay is not just relevant to architecture. He is mainly talking about hierarchical trees, which Dan Hill notes are "the technical and experiential structure of most sites on the web." For me, the key point we can draw from Alexander is that despite the intuitive manner with which we can arrange and consider data in tree-form, these forms don't occur when things (in this case towns and cities) develop organically. It could be for this reason that the original pioneer of 'space syntax' methods Bill Hillier says, "I wouldn't design a city … I'd grow one."

As Alasdair pointed out to me when I first read the article, it is unfortunate that the web has come to be dominated by hierarchical trees, when the original concept of hypertext and http was about navigating through complex networks. (Note to self: Alasdair also mentioned the post-structuralists and the notion of Finnegan's Wake as the first 'hypertext' book.)

Alasdair was right, of course, that the original hypertext aim was not to have hierarchies of documents, but to cross-reference and interlink to your heart's content. Hence "world wide web", and not "world wide tree". This distinction is explicit in Tim Berners-Lee's initial hypertext proposal for CERN (for the uninitiated, this marks the birth of the world wide web). The brilliant thing here is that Berners-Lee actually begins by describing the web of social contact and collaboration which transcended CERN's organisational hierarchy in the late eighties.

Quote from Tim Berners-Lee's "Information Management: A Proposal" follows, apologies for length but it all seems relevant.

"CERN is a wonderful organisation. It involves several thousand people, many of them very creative, all working toward common goals. Although they are nominally organised into a hierarchical management structure, this does not constrain the way people will communicate, and share information, equipment and software across groups.

The actual observed working structure of the organisation is a multiply connected "web" whose interconnections evolve with time. In this environment, a new person arriving, or someone taking on a new task, is normally given a few hints as to who would be useful people to talk to. Information about what facilities exist and how to find out about them travels in the corridor gossip and occasional newsletters, and the details about what is required to be done spread in a similar way. All things considered, the result is remarkably successful, despite occasional misunderstandings and duplicated effort.

A problem, however, is the high turnover of people. When two years is a typical length of stay, information is constantly being lost. The introduction of the new people demands a fair amount of their time and that of others before they have any idea of what goes on. The technical details of past projects are sometimes lost forever, or only recovered after a detective investigation in an emergency. Often, the information has been recorded, it just cannot be found."

The sad thing of course, apart from the increasingly hierarchical structuring of large sites, is that the web as we know it suffers from a high turnover of documents, much as Berners-Lee described a high turnover of people at CERN. As I pointed out in our crit session after the project, this problem afflicts the MSc too, since by design there is a yearly turnover of 90% of the people involved.

Back to alumni databases and social networks then, and to the defense of the tree, for a moment. I actually think that from an egocentric point of view a social network is most usefully considered a tree. That is, if I know two people already, it is of little consequence that they know each other. The only connections that matter to me are the ones which form the shortest paths between people I already know, and the people I want to know next. This is social networking in order to get ahead in business, or to make new friends, I admit.

On the other hand, considering the loops inside of who knows who, as well as the tree of who knows me, might allow a certain amount of insight to be gained into the nature of interactions across the whole social network. The interconnectedness of it all is what everyone was talking about in the crit, and what we're all stuck with trying to visualise and interpret in a meaningful way. Do self-organising structures hold the answer? I would argue not, but I'll leave that for another time.

I hope that we can be rid of the hierarchical straight-jacket that much of the web is in right now, and I think a combination of search engines and weblogs will get us there in the end (not to mention tagging systems like del.icio.us and Flickr which have emerged strongly since I first wrote this). Weblogs aren't just trendy, it's practically their whole raison d'etre to link and be linked, and we're seeing big businesses cotton on to this fact in a big way. If everyone had one, and used it (more than I use mine!), then maybe we would be able to map out social networks as we go, instead of trying to construct them after the fact.

On Hifi Separates in Information Space and Gadget Space

Matt Jones makes a nice analogy concerning the expected convergence of web-based services like Bloglines, Flickr or Blogger. Matt asserts that there will always be Home Info Theatre - the web equivalent of hi-fi separates - for those who want the highest quality services. The corollary here, then, is that sites like MySpace are the mini-systems of information space.

I use a similar argument to illustrate why there will probably always be separate gadgets, such as digital cameras and digital music players, despite the potential to integrate everything into one device (generally centred around a mobile phone). Most of the arguments which justify hi-fi separates will carry through to the gadget world - smoother upgrade paths, less chance of crippling failure, greater robustness etc. (the old, "small pieces, loosely joined" mentality). Of course, the downsides will apply too - separate components make for bulkier systems (more to carry), and the price is inevitably higher.

All this talk of convergence reminds me - I really want one of those Swiss Army Knife USB Storage things.

Future Digital Music Distribution and Production

This seems like a good place to park some notes I've made on where I think the music industry should be headed. There's a long article or three hidden in there somewhere, but I'm not ready to write it yet.

General trends. Wherever I get my music, be it from a brick and mortar outlet, an online store, or direct from an artist or label I need the following qualities:

Retailers. They should be fixated by choice, but also by managing choice. Distribution is now easy, even high-street shops should be able to provide anything I want, instantly. I should never have to order, and wait. They could download the data, burn a CD and print the packaging in 5 minutes - so why don't they? Why don't black-market independent shops do this from iTunes or Napster - or do they already? If Amazon have a rich database full of recommendation material, why don't HMV or Virgin? Shouldn't I be able to pick up a CD, and find out what else I might like (maybe put it on a recommendation shelf, based on a barcode scan or something)?

Venues. All of them should be recording and distributing every performance, subject to artist approval of course. I know that instant post-gig CDs are in the works (and patent encumbered I believe) but that will only happen in the worst corporate-sell-out kind of a way, I'm sure. And only at the level where every show sounds the same, says the cynic in me.

Artists. They should be making their work available across the full spectrum - not just album tracks but also live/rehearsal/demo/acoustic/rare. They have the authority and sources of depth I was talking about earlier. Bands like Sigur Rós have already demonstrated online liner notes (onliner notes?) are viable with their untitled album, ( ), even if it was in the pursuit of absolute minimalism (no words, no titles, no stickers on the box...). Artists are aware that a loyal fanbase will pay for new material, especially if they get it first (before the radio, before the magazines and reviewers even).

Studios. Studios should be digital-distribution aware. Sound engineers should be too. It's the norm now for amateur and unsigned bands leave the studio with CDRs and immediately encode it at home to send to friends and promote online. Why don't the studios invest in professional quality encoders and use their mastering and mix-down knowhow to provide a range of good quality digital formats, optimised for the music in question? Ditto the standalone mastering people. Ditto CD pressing plants, who should be able to do mixed-mode CDs with a range pre-encoded tracks for sharing (free promotion).

Pricing. It's occasionally mooted that artists should give away recordings and make money touring. That's a poor excuse if people are willing to pay for recorded music, and we know they are. Artists will suffer from the volume and choice of alternatives, so the cost per track must come down. Actually, the cost per track must come down if iPod buyers are to be able to afford to fill their iPod. Likewise, if people want to pay per play, the cost must be negligable. Of course, steadily lowered prices reach a limit eventually. Unfortunately, that limit isn't 0, download fans. As cost-per-song reduces, it tends to a collective/blanket license. Otherwise there's no money in the system, and artists don't get paid. So, how should a compulsory license be paid? Could it be a digital music player tax? (Wasn't there a licensing levy on blank media?) Or should it be opt-in? (Wasn't there once a license which allowed people to record music from the radio in the UK?)

Fairness. The popularity of artists suffers from a power-law distribution, I'm sure. Should the proceeds from license fees use that distribution exactly, or should we work to flatten the distribution (progressive tax, in effect)? Are Britney Spears, Robbie Williams, Madonna and the Rolling Stones capable of making up the difference using the gravity provided by their own mega-brands? What about Elvis? Is making excuses for weighting towards the little guy the same as saying that artists should give away music and tour to make up the difference?


Flickr: Using the Organizr

I've been using Flickr to manage my photos online for a few weeks now, and I think it's really excellent. It's a great way to share with friends, without the hassle of running your own complicated database. The social networking and instant messaging aspects of it haven't really reached a tipping point for me yet, but the storage, viewing and management aspect of it gets better all the time.

One thing I like is the ability the combined ability to set permissions on photos (for friends or family) and the ability to create invite-only groups with a shared gallery. Along with the excellent notes, comments and tagging facilities means it's possible for friends to collaborate on recording an event in quite a comprehensive manner. There's been one deal-breaker in all of this, until now that is. It's now possible to easily manage batches of photos, and add them to groups or sort them into galleries (photo sets), using a new Flash tool called Organizr. It's very swish for a web-app, and I assume it will only get better.

Today also marks the release of Flickr's full Services API which I'm going to take a good look at soon. I don't think I'll be able to resist knocking up something for Flickr using my other current distraction, Processing.