Usable Transparency

The NYT has recently been doing a lot of interactive pieces for the 2008 presidential election. One of these is an interactive chart presentation of different political polls done by different organizations. This isn’t quite game-y, though it could be if there were some additional features like being able to compare one poll to another, or to try to predict a future poll based on current polls for points. Anyway, the important point here is that these visualizations are based on some simple polling data, things like # of respondents, and % in favor of each candidate. The Times is transparent about this data in 2 ways, (1) by providing a link explaining eligibility for polls to be included in the chart and (2) by providing a link to the raw database dump of the data. The eligibility link speaks to data quality issues that can arise in the collection of data, which can lead to invalid results or bias. The database dump link speaks to the ability to peer behind the graphic to the actual data used to produce it.

It’s useful to draw a distinction between data and information here, data being raw sensor readings or direct observations and information being additional context and interpretation based on data. There’s a difference in what needs to be done in terms of transparency of data (which the Time did magnificently for the interactive polling piece) and transparency of information. This is because there is a layer of contextualization and interpretation that also needs to be explicated in order to be transparent about information. This touches on issues of individual and organizational biases since interpretation itself is influenced by these outside sources. Moreover interpretation can be something encoded into mathematical equations that produce information (derived values) from the actual raw data. Consider the mean of all polls for each candidate. This is a derived value, albeit one that most people understand readily, but nonetheless which takes an interpretive stance that a mean of polling data collected under different circumstances is meaningful. As we move from simple means to more complexity, a data driven model is really nothing more than a series of complex mathematical manipulations which interpret the data into a manageable form of information.

Here’s the crux: to be transparent about information (interpretation from data), journalists need a way to be express interpretations or manipulations, mathematical though they may be, in a way that is easily understood. This has direct bearing on games for journalism since the models on which games interpret the world will be important to explicate to consumers in the spirit of transparency. The problem alas is that math is inpenetrable to many. Imagine the Times providing a 3rd link for transparency, one which shows a nasty equation on top of which a simulation is built. This is important, because even though many people won’t take the time to understand it, the people that take the time to will be able to verify or understand the model. But what about the other people? They need Usable Transparency. I like to think that a simulation game like SimCity follows the principle of usable transparency – you don’t need to understand the simulation model to be able to make decisions in the game. The manual describes in prose what to do to alleviate trash problems, create more jobs, or reduce rush hour traffic jams. I think this is a useful paradigm that would serve journalists well in thinking about transparency as it relates to games. The collection of the data is important, check. The data itself is important, check. But the mathematical model which drives a simulation is important too. I would argue for a prose description of that model which itself is footnoted with grounding equations.