Many times I’ve been fascinated with visualization of wast amounts of data. I think one of the first impressive on i saw was “the dumpster“and today i saw another interesting TED talk featuring the work of Jonathan Harris who describes himself as “an artist and storyteller working primarily on the Internet“. Indeed, i think he presented some quite impressive projects. You can judge yourself (warning – it’s almost 20 minutes):
Listening to the talk generated two main streams of ideas in my head. One a more critical one, and the other is a more practical. I’ll start with the critical, to leave a better taste at the end, for all in all i am amused by Harris’s work.
The critical comment is basically about the way things were presented and particularly the language used during the presentation. The projects presented in the video deal mainly with graphic representation of user-generated content. As audience, we have no idea how exactly the content was sampled with an exception of “we feel fine” project where we know that it is based on a simple search of variations of the word “feel” on some mysterious database of blogs, which pretend to cover the “internet”. It is not stated, but with an exception of “time capsule” (more here) the other two projects seemed to target English speaking audience only. Watching the video, i assume that the other projects work more or less in a similar fashion. So far, it seems ok. Right?
What did disturb me in the talk is the chutzpa of Harris to refer to the entire human race while describing his projects. Well, that is particularly standing out for me as someone trying to study what is labeled as “digital divide”. First of all, however shocking it may be to some technology enthusiasts, 82.4% of world population are still not online (as of June 30, 2007). Another shocking news, is that not the entire world population speaks English. I have a study from 2000 by Suppiramaniam Nanthikesan stating that back then English speakers constituted only 48.5%-50.9% of online population (I know i need some more updated data). Given that internet penetration rates in the developing countries were higher than those in the developed world, I doubt that 100% of people online are English speakers. Finally, most of the people online do not create content. For example according to a very recent PEW report only about 8% of Americans are taking full advantage of the contemporary technology, which includes blogging. In other words, the other 92% are barely creating content (so much for web 2.0, but that is a subject for another post). The bottom line is that i think Jonathan is blowing the thing out of proportion. I have to admit that at no stage Jonathan claimed to do a scientific work. He is an artists, and as such i might be judging him too harshly. My problem probably that this is talk is representative of a trend, and this is why it triggered all the above. At the same time i have also to admit he is doing great marketing for his projects.
Another annoying thing was the comparison between Greek mythology and contemporary news. Leaving aside the fact that we, as audience, have no idea what newspapers have been sampled in the “universe” project (currently the top story on Harris’s homepage), i think it is a bit overambitious to suggest that this what creates the mythology of our times. As much as our current history is written to a large degree through mass media, scaling news on a time line of centuries, requires some selection and self-criticism. Ironically, in the video, one of the “myths” showed by the program was Anna Nicole Smith. With all due respect to her cultural contribution, i will be surprised if in 100 years she will be remembered as a cultural representation of our times. So, again, as much as the tools are impressive, i think it is important to keep this kind of things in proportion, even if you are an artist.
Now, having said all the above, i admit that the presented tools are interesting. This takes me to the more practical line of thought. As i mentioned above, at the end of the day, what he is attempting to do is sort of a mega-content analysis. Taking out the bombastic statements, the amounts of data analyzed in each project is huge and it is a fascinting data in itself. I think the attempt to mechanize a massive content analysis is really interesting, as is the graphic representation of it. Indeed, the internet presents a good and convenient opportunity to study texts – both user generated and digitized versions of the mainstream media. Eventually we will have to figure out how to do it in other ways rather than manually.
Recently i had a conversation with Claire Cardie from the Information Science program at Cornell, who is working on natural language understanding. It would be pretty amazing if we could combine the tools she is working on with the visualization capabilities of Harris’s projects, and the knowledge accumulated in social science (particularly communication :). If configured correctly, i believe Jonathan’s visualization tools can help both the analysis of the data and, which is also important, its explanation to people, thus revealing a bit this convoluted field of social science to broader public. I think this kind of cooperation can have a great potential.
For the little chance that Jonathan Harris is reading this post, i will mention a project i am participating in at the moment. It is all about user-generated content on a wiki platform, and our aim is to have a lot of content for analysis at the end of the day. I will definitely blog about this project more in the future, but Jonathan, if you are reading this, you are welcome to drop an email :)