Thanks to the head start of the likes of the New York Times, the Guardian, and der Spiegel we now have some excellent written reporting on a few of the more important issues exposed in the wikileaks cablegate data. There have also been a number of visualizations of the dataset published in the last few days (e.g. Infothetics has a nice round-up), which help, to some extent, in browsing and making sense of all of the data there.
But what I want to suggest here is that, with all of the attention that this story is getting, that there may be some useful information to mine from social media about what is interesting, important, and noteworthy in the dataset. One of the most useful aspects of social media such as Twitter is that it provides a platform where interested individuals can make observations about what’s going on around them, including observations of large collections of documents.
At Rutgers, where I work, we’ve been developing a social media visual analytics tool call Vox Civitas and have collected a dataset of almost 60,000 (and growing) English language tweets marked with the hashtag “#cablegate” from Twapperkeeper. Vox provides the ability to visualize the collection over time, with sentiment, and includes capabilities for filtering according to many criteria. Without further ado, click here to see the cablegate dataset in Vox. Let us know what you find or if it inspires any follow-up work!