Tag Archives: journalism

The Rhetoric of Data

Note: A version of the following also appears on the Tow Center blog.

In the 1830’s abolitionists discovered the rhetorical potential of re-conceptualizing southern newspaper advertisements as data. They “took an undifferentiated pile of ads for runaway slaves, wherein dates and places were of primary importance … and transformed them into data about the routine and accepted torture of enslaved people,” writes Ellen Gruber Garvey in the book Raw Data is an Oxymoron. By creating topical dossiers of ads, the horrors of slavery were catalogued and made accessible for writing abolitionist speeches and novels. The South’s own media had been re-contextualized into a persuasive weapon against itself, a rhetorical tool to bolster the abolitionists’ arguments.

The Latin etymology of “data” means “something given,” and though we’ve largely forgotten that original definition, it’s helpful to think about data not as facts per se, but as “givens” that can be used to construct a variety of different arguments and conclusions; they act as a rhetorical basis, a premise. Data does not intrinsically imply truth. Yes we can find truth in data, through a process of honest inference. But we can also find and argue multiple truths or even outright falsehoods from data.

Take for instance the New York Times interactive, “One Report, Diverging Perspectives,” which wittingly highlights this issue. Shown below, the piece visualizes jobs and unemployment data from two perspectives, emphasizing the differences in how a democrat or a republican might see and interpret the statistics. A rising tide of “data PR” often manifesting as slick and pointed infographics won’t be so upfront about the perspectives being argued though. Advocacy organizations can now collect their own data, or just develop their own arguments from existing data for supporting their cause. What should you be looking out for as a journalist when assessing a piece of data PR? And how can you improve your own data journalism by ensuring the argument you develop is a sound one?

one report diverging perspectives

Contextual journalism—adding interpretation or explanation to a story—can and should be applied to data as much as to other forms of reporting. It’s important because the audience may need to know the context of a dataset in order to fully understand and evaluate the larger story in perspective. For instance, context might include explaining how the data was collected, defined, and aggregated, and what human decision processes contributed to its creation. Increasingly news outlets are providing sidebars or blog posts that fully describe the methodology and context of the data they use in a data-driven story. That way the context doesn’t get in the way of the main narrative but can still be accessed by the inquisitive reader.

In your process it can be useful to ask a series of contextualizing questions about a dataset, whether just critiquing the data, or producing your own story.

Who produced the data and what was their intent? Did it come from a reputable source, like a government or inter-governmental agency such as the UN, or was it produced by a third party corporation with an uncertain source of funding? Consider the possible political or advocacy motives of a data provider as you make inferences from that data, and do some reporting if those motives are unclear.

When was the data collected? Sometimes there can be temporal drift in what data means, how it’s measured, or how it should be interpreted. Is the age of your data relevant to your interpretation? For example, in 2010 the Bureau of Labor Statistics changed the definition of long-term unemployment, which can make it important to recognize that shift when comparing data from before and after the change.

Most importantly it’s necessary to ask what is measured in the data, how was it sampled, and what is ultimately depicted? Are data measurements defined accurately and in a way that they can be consistently measured? How was the data sampled from the world? Is the dataset comprehensive or is it missing pieces? If the data wasn’t randomly sampled how might that reflect a bias in your interpretation? Or have other errors been introduced into the data, for instance through typos or mistaken OCR technology? Is there uncertainty in the data that should be communicated to the reader? Has the data been cropped or filtered in a way that you have lost a potentially important piece of context that would change its interpretation? And what about aggregation or transformation? If a dataset is offered to you with only averages or medians (i.e. aggregations) you’re necessarily missing information about how the data might be distributed, or about outliers that might make interesting stories. For data that’s been transformed through some algorithmic process, such as classification, it can be helpful to know the error rates of that transformation as this can lead to additional uncertainty in the data.

Let’s consider an example that illustrates the importance of measurement definition and aggregation. The Economist graphic below shows the historic and forecast vehicle sales for different geographies. The story the graph tells is pretty clear: Sales in China are rocketing up while they’re declining or stagnant in North America and Europe. But look more closely. The data for Western Europe and North America is defined as an aggregation of light vehicle sales, according to the note in the lower-right corner. How would the story change if the North American data included truck, SUV, and minivan sales? The story you get from these kinds of data graphics can depend entirely on what’s aggregated (or not aggregated) together in the measure. Aggregations can serve as a tool of obfuscation, whether intentional or not.

 vehicle sales

It’s important to recognize and remember that data does not equal truth. It’s rhetorical by definition and can be used for truth finding or truth hiding. Being vigilant in how you develop arguments from data and showing the context that leads to the interpretation you make can only help raise the credibility of your data-driven story.


Tweaking Your Credibility on Twitter

You want to be credible on social media, right? Well, a paper to be published at the Conference on Computer Supported Cooperative Work (CSCW) in early 2012 from researchers at Microsoft and Carnegie Mellon suggests at least a few actionable methods to help you do so. The basic motivation for the research is that when people see your tweet via a search (rather than following you) they have less cues to assess credibility. With a better understanding of what factors influence tweet credibility, new search interfaces can be designed to highlight the most relevant credibility cues (now you see why Microsoft is interested).

First off, five people were interviewed by the researchers to collect a range of issues that might be relevant to credibility perception. They came up with a list of 26 possible credibility cues and then ran a survey with 256 respondents in which they asked how much each feature impacted credibility perception. You can see the paper for the full results, but, for instance, things like keeping your tweets on a similar topic, using a personal photo, having a username related to the topic, having a location near a topic, having a bio that suggests relavent topical expertise, and frequent tweeting were all perceived by participants to positively impact credibility to some extent. Things like using non-standard grammar and punctuation, using the default user image were seen to detract from credibility.

Based on their first survey, the researchers then focused on three specific credibility cues for a follow-on study: (1) topic of tweets (politics, science, or entertainment), (2) user name style (first_last, internet – “tenacious27”, and topical – “AllPolitics”), and finally (3) user image (male / female photo, topical icon, generic icon, and default). For the study, each participant (there were 266) saw some combination of the above cues for a tweet, and rated both tweet credibility and author credibility. Unsurprisingly tweets about the science topic were rated as more credible than those on politics or entertainment. The most surprising result to me was that topically relevant user names were more credible than traditional names (or internet style names, though that’s not surprising). In a final follow-up experiment the researchers found that the user image doesn’t impact credibility perceptions, except for when the image is the default image in which case it significantly (in the statistical sense) lowers perceptions of tweet credibility.

So here are the main actionable take-aways:

  • Don’t use non standard grammar and punctuation (no “lol speak”)
  • Don’t use the default image.
  • Tweet about topics like science, which seem to carry an aura of credibility.
  • Find a user name that is topically aligned with those you want to reach.
That last point of finding a topically aligned user name might be an excellent strategy for large news organizations to build a more credible presence across a range of topics. For instance, right now the NY Times has a mix of accounts that have topical user names, as well as reporters using their real names. In addition to each reporter having their own “real name” account, individual tweets of theirs that were topically relevant could be routed to the appropriate topically named account. So for instance, let’s say Andy Revkin tweets something about the environment. That tweet should also show up via the Environment account, since the tweet may be perceived as having higher credibility from a topically-related user name. For people who search and find that tweet, of course if they know who Andy Revkin is, then they’ll find his tweet quite credible since he’s known for having that topical expertise. But for someone else who doesn’t know who Andy Revkin is, the results of the above study suggest that that person would find the same content more credible coming from the topically related Environment account. Maybe the Times or others are already doing this. But if not, it seems like there’s an opportunity to systematically increase credibility by adopting such an approach.

Designing Tools for Journalism

Whether you’re designing for professionals or amateurs, for people seeking to reinvigorate institutions or to invent new ones, there are still core cultural values ensconced in journalism that can inspire and guide the design of new tools, technologies, and algorithms for committing acts of journalism. How can we preserve the best of such values in new technologies? One approach is known as value sensitive design and attempts to account for human values in a comprehensive manner throughout the design process by identifying stakeholders, benefits, values, and value conflicts to help designers prioritize features and capabilities.

“Value” is defined as “what a person or group of people consider important in life”. Values could include things like privacy, property rights, autonomy, and accountability among other things. What does journalism value? If we can answer that question, then we should be able to design tools for professional journalists that are more easily adopted (“This tool makes it easy to do the things I find important and worthwhile!”), and we should be able to design tools that more easily facilitate acts of journalism by non-professionals (“This tool makes it easy to participate in a meaningful and valuable way with a larger news process!”). Value sensitive design espouses consideration of all stakeholders (both direct and indirect) when designing technology. I’ve covered some of those stakeholders in a previous post on what news consumers want, but another set of stakeholders would be those relating to the business model (e.g. advertisers). In any case, mismatches between the values and needs of different stakeholders will lead to conflicts that need to be resolved by identifying benefits and prioritizing features.

When we turn to normative descriptions of journalism, such as Kovach and Rosenstiel’s The Elements of Journalism and Blur, Schudson’s The Sociology of News, or descriptions of ethics principles from the AP or ASNE, we find both core values, as well as valued activities. It’s easiest to understand these as ideals which are not always met in practice. Some core values include:

  • Truth: including a commitment to accuracy, verification, transparency, and putting things in context
  • Independence: from influence by those they cover, from politics, from corporations, or from others they seek to monitor
  • Citizen-first: on the side of the citizen rather than for corporations or political factions
  • Impartial: except when opinion has been clearly marked
  • Relevance: to provide engaging and enlightening information

Core values also inform valued activities or roles, such as:

  • Informer: giving people the information they need or want about contemporary affairs of public interest
  • Watchdog: making sure powerful institutions or individuals are held to account (also called “accountability journalism”)
  • Authenticator: assessing the truth-value of claims (“factchecking”); also relates to watchdogging
  • Forum Organizer: orchestrating a public conversation, identifying and consolidating community
  • Aggregator: collecting and curating information to make it accessible
  • Sensemaker: connecting the dots and making relationships salient

Many of these values and valued activities can be seen from an information science perspective as contributing to information quality, or the degree of excellence in communicating knowledge. I’ll revisit the parallels to information science in a future post.

Besides core values and valued activities, there are other, perhaps more abstract, processes which are essential to producing journalism, like information gathering, organization and sensemaking, communication and presentation, and dissemination. Because they’re more abstract these processes have a fair amount of variability as they are adapted for different milieu (e.g. information gathering on social media) or media (e.g. text, image, video, games). Often valued activities are already the composition of several of these underlying information processes that have been infused with core values. We should be on the lookout for “new” valued activities waiting for products to emerge around them, for instance, by considering more specific value-added information processes in conjunction with core values.

There’s a lot of potential for technology to re-invent and re-imagine valued activities and abstract information processes in light of core values: to make them more effective, efficient, satisfying, productive, and usable. Knowing the core values also helps designers understand what would not be acceptable to design for professionals (e.g. a platform to facilitate the acquisition of paid sources would probably not be adopted in the U.S.). I would argue that it’s the function that is served by the above valued activities, and not the institutionalized practices that are currently used to accomplish them, that is fundamentally important to consider for designers. While we should by all means consider designs that adhere to core values and to an understanding of the outputs of valued activities, we should also be open to allowing technology to enhance the processes and methods which get us there. Depending on whether you’re innovating in an institutional setting or in an unencumbered non-institutional environment you have different constraints, but, irregardless I maintain that value sensitive design is a good way forward to ensure that future tools for journalism will be more trustworthy, have more impact, and resonate more with the public.

Open Government and Transparency

Last week I had the opportunity to attend the Open Government Workshop at Princeton University. The three main topics on the table were defining, designing, and sustaining transparency in government: all important aspects of fleshing out the Obama rhetoric of an open government especially as the technologists struggle to make sense of all of the data that the government is publishing as part of its transparency initiates.

So what does “Transparency” really mean? This was an object of debate among the first four panelists, Jon Weinberg (Wayne State), Helen Nissenbaum (NYU), Patrice McDermott (OpenTheGovernment.org) and J.H. Snider (iSolon.org). The general consensus definition is that transparency is the idea that the public can observe government decision making; that the government is open for inspection.

And while there was little argument that transparency facilitates democratic control and legitimacy, there was dissent (particularly from Nissenbaum) that not all government data needs to be transparent. “More can obscure, more can obfuscate, we want not all the information out there, but we want the information to be reduced Рto develop principles of reduction which take openness and turn it into transparency,” said Nissenbaum. Her primary argument comes from her study of privacy, though security was also mentioned. For example court records contain the names of jurors, but what would be the value of publishing that information?  This is an interesting nuance in comparison to the prevailing opinion of the technorati (that ALL data must be published).

In the absence of publishing everything though it seems that the government would need to develop guidelines for not only what was or wasn’t published, but also the rationale. At least then the public would know what was being withheld and why.

Next up was the panel on Designing Transparency including Ginny Hunt (Google), Clay Johnson (Sunlight Labs), Eric Kansa (UC Berkeley), and Josh Tauberer (GovtTrack.us). From a technological perspective this was the most interesting panel, especially hearing about all of the work that the Sunlight foundation has done to build a community of software developers interesting in making sense of the government data deluge. Sunlight is known for sponsoring large “app” contests, and Clay spoke extensively about the Sunlight strategy behind these contests. For instance, the Apps for America data.gov challenge was designed to help validate the release of the data by the government, to help find the most interesting data sources (a crowdsourcing approach), and to build community.

Clay acknowledged that most apps created via their contests are not sustainable, but that the goals of the contests were more about building a hacking community around the data. Indeed he referred to Sunlight Labs as, “A match.com for people who want non-romantic relationships and want to create open-source government projects.”

The next Sunlight contest will try to build community around the design / art component – to make government data more accessible and consumable rather than more pragmatic. While it’s great that Sunlight is spurring the creation of community and relationships between like-minded individuals, it’s hard to think that their approach is really sustainable.

I’ve argued before that there’s little impetus for building on and connecting the dots when there’s lots of apps the are birthed and die in such a short period of time. Sensemaking and making government data accessible is something where journalism institutions can and should take on the challenge of sustainability.