Category Archives: rhetoric

The Rhetoric of Data

Note: A version of the following also appears on the Tow Center blog.

In the 1830’s abolitionists discovered the rhetorical potential of re-conceptualizing southern newspaper advertisements as data. They “took an undifferentiated pile of ads for runaway slaves, wherein dates and places were of primary importance … and transformed them into data about the routine and accepted torture of enslaved people,” writes Ellen Gruber Garvey in the book Raw Data is an Oxymoron. By creating topical dossiers of ads, the horrors of slavery were catalogued and made accessible for writing abolitionist speeches and novels. The South’s own media had been re-contextualized into a persuasive weapon against itself, a rhetorical tool to bolster the abolitionists’ arguments.

The Latin etymology of “data” means “something given,” and though we’ve largely forgotten that original definition, it’s helpful to think about data not as facts per se, but as “givens” that can be used to construct a variety of different arguments and conclusions; they act as a rhetorical basis, a premise. Data does not intrinsically imply truth. Yes we can find truth in data, through a process of honest inference. But we can also find and argue multiple truths or even outright falsehoods from data.

Take for instance the New York Times interactive, “One Report, Diverging Perspectives,” which wittingly highlights this issue. Shown below, the piece visualizes jobs and unemployment data from two perspectives, emphasizing the differences in how a democrat or a republican might see and interpret the statistics. A rising tide of “data PR” often manifesting as slick and pointed infographics won’t be so upfront about the perspectives being argued though. Advocacy organizations can now collect their own data, or just develop their own arguments from existing data for supporting their cause. What should you be looking out for as a journalist when assessing a piece of data PR? And how can you improve your own data journalism by ensuring the argument you develop is a sound one?

one report diverging perspectives

Contextual journalism—adding interpretation or explanation to a story—can and should be applied to data as much as to other forms of reporting. It’s important because the audience may need to know the context of a dataset in order to fully understand and evaluate the larger story in perspective. For instance, context might include explaining how the data was collected, defined, and aggregated, and what human decision processes contributed to its creation. Increasingly news outlets are providing sidebars or blog posts that fully describe the methodology and context of the data they use in a data-driven story. That way the context doesn’t get in the way of the main narrative but can still be accessed by the inquisitive reader.

In your process it can be useful to ask a series of contextualizing questions about a dataset, whether just critiquing the data, or producing your own story.

Who produced the data and what was their intent? Did it come from a reputable source, like a government or inter-governmental agency such as the UN, or was it produced by a third party corporation with an uncertain source of funding? Consider the possible political or advocacy motives of a data provider as you make inferences from that data, and do some reporting if those motives are unclear.

When was the data collected? Sometimes there can be temporal drift in what data means, how it’s measured, or how it should be interpreted. Is the age of your data relevant to your interpretation? For example, in 2010 the Bureau of Labor Statistics changed the definition of long-term unemployment, which can make it important to recognize that shift when comparing data from before and after the change.

Most importantly it’s necessary to ask what is measured in the data, how was it sampled, and what is ultimately depicted? Are data measurements defined accurately and in a way that they can be consistently measured? How was the data sampled from the world? Is the dataset comprehensive or is it missing pieces? If the data wasn’t randomly sampled how might that reflect a bias in your interpretation? Or have other errors been introduced into the data, for instance through typos or mistaken OCR technology? Is there uncertainty in the data that should be communicated to the reader? Has the data been cropped or filtered in a way that you have lost a potentially important piece of context that would change its interpretation? And what about aggregation or transformation? If a dataset is offered to you with only averages or medians (i.e. aggregations) you’re necessarily missing information about how the data might be distributed, or about outliers that might make interesting stories. For data that’s been transformed through some algorithmic process, such as classification, it can be helpful to know the error rates of that transformation as this can lead to additional uncertainty in the data.

Let’s consider an example that illustrates the importance of measurement definition and aggregation. The Economist graphic below shows the historic and forecast vehicle sales for different geographies. The story the graph tells is pretty clear: Sales in China are rocketing up while they’re declining or stagnant in North America and Europe. But look more closely. The data for Western Europe and North America is defined as an aggregation of light vehicle sales, according to the note in the lower-right corner. How would the story change if the North American data included truck, SUV, and minivan sales? The story you get from these kinds of data graphics can depend entirely on what’s aggregated (or not aggregated) together in the measure. Aggregations can serve as a tool of obfuscation, whether intentional or not.

 vehicle sales

It’s important to recognize and remember that data does not equal truth. It’s rhetorical by definition and can be used for truth finding or truth hiding. Being vigilant in how you develop arguments from data and showing the context that leads to the interpretation you make can only help raise the credibility of your data-driven story.

 

Unpacking Visualization Rhetoric

Note: An edited version of the following also appears on the Chart.io blog. 

Visualization can be useful for both more exploratory purposes (e.g. generating analyses and insights based on data) as well as more communicative ends (e.g. helping other people understand and be persuaded or informed by the insights that you’ve uncovered). Oftentimes more general visualization techniques are used in the exploratory phase, whereas more specific, tailored, and hand-crafted techniques (like infographics) tend to be preferred for maximal persuasive potential in the communicative phase.

In the middle ground is a class of visualizations termed “narrative visualization” – often used in journalism contexts – which tend to include aspects of both exploratory and communicative visualization. This blending of techniques makes for an interesting domain of study and it’s here where Jessica Hullman and I began investigating how different rhetorical (persuasive) techniques are employed in visualization. We were particularly interested in how different rhetorical techniques can be used to affect the interpretation of a visualization – valuable knowledge for visualization designers hoping to influence and mold the interpretation of their audience. (Here we defer the sticky ethical question of whether someone should use these techniques since in general they can be used for both good and ill).

We carefully analyzed 51 narrative visualizations and constructed a taxonomy of rhetorical techniques we found being used. We observed rhetorical techniques being employed at four different editorial layers of a visualization: data, visual representation, annotations, and interactivity. Choices at any of these layers can have important implications for the ultimate interpretation of a visualization (e.g. the design of available interactivity can direct or divert attention). The five main classes of rhetoric we found being used include: information access (e.g. how data is omitted or aggregated), provenance (e.g. how data sources are explained and how uncertainty is shown), mapping (e.g. the use of visual metaphor), linguistic techniques (e.g. irony or apostrophe), and procedural rhetoric (e.g. how default views anchor interpretation).

The maxim “know thy audience” points to another dimension by which a visualization creator can influence the interpretation of a visualization. While most visualizations concentrate on the denotative level of communication, the most effective visualization communicators also make use of the connotative level of communication to unlock a whole other plane of interpretation. For instance, various cultural codes (e.g. what colors mean), or conventions (e.g. line graphs suggest you’re looking at temporal data even if you’re not) can suggest alternate or preferred interpretations.

While the full explanation of the taxonomy and use of codes and connotation for communication in visualization is beyond this blog post, you can see a more complete discussion in a pre-print of our forthcoming InfoVis paper.  At the very least though I’ll leave you with an example which illustrates some of these concepts.

Take the following recent example from the New York Times where various aspects of the visualization rhetoric framework apply.

The choice of labeling on the dimensions of the chart “reduce spending” vs. “don’t reduce spending” leaves out another option, “increase spending”. The choice of the color green for “willing to compromise” connotes a certain value judgement (i.e. “go, or move ahead”) as read from an American perspective. The way individual squares are aggregated to arrive at an overall color is unclear, leading to questions that could be clarified through better use of provenance rhetoric. Moreover, squares cannot be disaggregated or understood as individual data, making it difficult for users to interpret either the magnitude of the response or the specific data reported in any one square. While compelling, applying the visualization rhetoric framework during the design of this visualization could have suggested other ways to make the interpretation of the visualization more clear.

Ultimately visualization rhetoric is a framework that can be useful for designers hoping to maximize the communicative potential of a visualization. Exploratory visualization platforms (like Tableau or Chart.io) could also be enhanced with an awareness of visualization rhetoric, by, for instance, allowing users to make salient use of certain rhetorical techniques when the time comes to share a visualization.

Those particularly interested in this space should consider participating in an upcoming workshop I am co-organizing on “Telling Stories with Data” at InfoVis 2011 in Providence, RI in late October.