Category Archives: journalism

Making Data More Familiar with Concrete Scales

Note: A version of the following also appears on the Tow Center blog.

 

As part of their coverage of the Snowden leaks, last month the Guardian published an interactive to help explain what the NSA data collection activities mean for the public. Above is a screenshot of part of the piece. It allows the user to input the number of friends they have on Facebook and see a typical number of 1st, 2nd (friends-of-friends), and 3rd (friends-of-friends-of-friends) degree connections as compared to places where you typically find different numbers of people. So 250 friends is more than the capacity of a subway car, 40,850 friends-of-friends is more than would fit in Fenway Park, and 6.7 million 3rd degree connections is bigger than the population of Massachusetts.

When we tell stories with data it can be hard for readers to grasp units or measures that are outside of normal human experience or outside of their own personal experience. How much *is* 1 trillion dollars, or 200 calories, really? Unless you’re an economist, or a nutritionist respectively it might be hard to say. Abstract measures and units can benefit from making them more concrete. The idea behind the Guardian interactive was to take something abstract, like a big number of people, and compare it to something more spatially familiar and tangible to help drive it home and make it real.

Researchers Fanny ChevalierRomain Vuillemot, and  Guia Gali have been studying the use of such concrete scales in visualization and recently published a paper detailing some of the challenges and practical steps we can use to more effectively employ these kinds of scales in data journalism and data visualization.

In the paper they describe a few different strategies for making concrete scales, including unitization, anchoring, and analogies. Shown in the figure below, (a) unitization is the idea of re-expression one object in terms of a collection of objects that may be more familiar (e.g. the mass of Saturn is 97 times that of Earth); (b) anchoring uses a familiar object, like the size of a match head, to make the size of another unfamiliar object (e.g. a tick in this case) more concrete; and (c) analogies make parallel comparisons to familiar objects (e.g. atom is to marble, as human head is to earth).

All of these techniques are really about visual comparison to the more familiar. But the familiar isn’t necessarily exact. For instance, if I were to compare the height of the Empire State Building to a number of people stacked up, I would need to use the average height of a person, which is really an idealized approximation. So it’s important to think about the precision of the visual comparisons you might be setting up with concrete scales.

Another strategy often used with concrete scales is containment, which can be useful to communicate impalpable volumes or collections of material. For example you might want to make visible the amount of sugar in different sizes of soda bottles by filling plastic bags with different amounts of granular sugar. Again, this is an approximate comparison but also makes it more familiar and material.

So, how can you design data visualizations to effectively use concrete scales? First you should ask if it’s an unfamiliar unit or whether it has an extreme magnitude that would make it difficult to comprehend. Then you need to find a good comparison unit that is more familiar to people. Does it make sense to unitize, anchor, or use an analogy? And if you use an anchor or container, which one should you choose? The answers to these questions will depend on your particular design situation as well as the semantics of the data you’re working with. A number of examples that the researchers have tagged are available online.

The individual nature of “what is familiar” does beg the question about personalization of concrete scales too. Michael Keller’s work for Al Jazeera lets you compare the number of refugees from the Syrian conflict to a geographic extent in the US, essentially letting the user’s own familiarity with geography guide what area they want to compare as an anchor. What if this type of personalization could also be automated? Consider logging into Facebook or Twitter and the visualization adapting to use concrete scales to the places, objects, or organizations you’re most familiar with based on your profile information. This type of automated visualization adaptation could help make such visual depictions of data much more personally relevant and interesting.

Even though concrete scales are often used in data visualizations in the media it’s worth also realizing that there are some open questions too. How do we define whether an anchor or unit is “familiar” or not, and what makes one concrete unit better than another? Perhaps some scales make people feel like they understand the visualization better or help the reader remember the visualization better. There are still many open questions for empirical research.

Algorithmic Defamation: The Case of the Shameless Autocomplete

Note: A version of the following also appears on the Tow Center blog.

In Germany, a man recently won a legal battle with Google over the fact that when you searched for his name, the autocomplete suggestions connected him to “scientology” and “fraud,” — two things that he felt had defamatory insinuations. As a result of losing the case, Google is now compelled to remove defamatory suggestions from autocomplete results when notified, in Germany at least.

Court cases arising from autocomplete defamation aren’t just happening in Germany though. In other European countries like Italy, France, and Ireland, to as wide afield as Japan and Australia people (and corporations) have brought suit alleging these algorithms defamed them by linking their names to everything from crime and fraud to bankruptcy or sexual conduct. In some cases such insinuations can have real consequences for finding jobs or doing business. New services, such as brand.com’s “Google Suggest Plan” have even arisen to help people manipulate and thus avoid negative connotations in search autocompletions.

The Berkman Center’s Digital Media Law Project (DMLP) defines a defamatory statement generally as, “a false statement of fact that exposes a person to hatred, ridicule or contempt, lowers him in the esteem of his peers, causes him to be shunned, or injures him in his business or trade.” By associating a person’s name with some unsavory behavior it would seem indisputable that autocomplete algorithms can indeed defame people.

So if algorithms like autocomplete can defame people or businesses, our next logical question might be to ask how to hold those algorithms accountable for their actions. Considering the scale and difficulty of monitoring such algorithms, one approach would be to use more algorithms to keep tabs on them and try to find instances of defamation hidden within their millions (or billions) of suggestions.

To try out this approach I automatically collected data on both Google and Bing autocompletions for a number of different queries relating to public companies and politicians. I then filtered these results against keyword lists relating to crime and sex in order to narrow in on potential cases of defamation. I used a list of the corporations on the S&P 500 to query the autocomplete APIs with the following templates, where “X” is the company name: “X,” “X company,” “X is,” “X has,” “X company is,” and “X company has.” And I used a list of U.S. congresspeople from the Sunlight Foundation to query for each person’s first and last name, as well as adding either “representative” or “senator” before their name. The data was then filtered using a list of sex-related keywords, and words related to crime collected from the Cambridge US dictionary in order to focus on a smaller subset of the almost 80,000 autosuggestions retrieved.

Among the corporate autocompletions that I filtered and reviewed, there were twenty-four instances that could be read as statements or assertions implicating the company in everything from corruption and scams to fraud and theft. For instance, querying Bing for “Torchmark” returns as the second suggestion, “torchmark corporation job scam.” Without really digging deeply it’s hard to tell if Torchmark corporation is really involved in some form of scam, or if there’s just some rumors about scam-like emails floating around. If those rumors are false, this could indeed be a case of defamation against the company. But this is a dicey situation for Bing, since if they filtered out a rumor that turned out to be true it might appear they were trying to sweep a company’s unsavory activities under the rug. People would ask: Is Bing trying to protect this company? At the same time they would be doing a disservice to their users by not steering them clear of a scam.

While looking through the autocompletions returned from querying for congresspeople it became clear that a significant issue here relates to name collisions. For relatively generic congressperson names like “Gerald Connolly” or “Joe Barton” there are many other people on the internet with the same names. And some of those people did bad things. So when you Google for “Gerald Connolly” one suggestion that comes up is “gerald connolly armed robbery,” not because Congressman Gerald Connolly robbed anyone but because someone else in Canada by the same name did. If you instead query for “representative Gerald Connolly” the association goes away; adding “representative” successfully disambiguates the two Connollys. The search engine has it tough though: Without a disambiguating term how should it know you’re looking for the congressman or a robber? There are other cases that may be more clear-cut instances of defamation, such as on Bing “Joe Barton” suggesting “joe barton scam” which was not corrected when adding the title “representative” to the front of the query. That seems to be more of a legitimate instance of defamation since even with the disambiguation it’s still suggesting the representative is associated with a scam. And with a bit more searching around it’s also clear there is a scam related to a Joe Barton, just not the congressman.

Some of the unsavory things that might hurt someone’s reputation in autocomplete suggestions could be true though. For instance, an autocompletion for representative “Darrell Issa” to “Darrell Issa car theft” is a correct association arising from his involvement with three separate car theft cases (for which his brother ultimately took the rap). To be considered defamation, the statement must actually be false, which makes it that much harder to write an algorithm that can find instances of real defamation. Unless algorithms can be developed that can detect rumor and falsehood, you’ll always need a person assessing whether an instance of potential defamation is really valid. Still, such tips on what might be defamatory can help filter and focus attention.

Understanding defamation from a legal standpoint brings in even more complexity. Even something that seems, from a moral point of view, defamatory might not be considered so by a court of law. Each state in the U.S. is a bit different in how it governs defamation. A few key nuances relevant to the court’s understanding of defamation relate to perception and intent.

First of all, a statement must be perceived as fact and not opinion in order to be considered defamation by the court. So how do people read search autocompletions? Do they see them as collective opinions or rumors reflecting the zeitgeist, or do they perceive them as statements of fact because of their framing as a result from an algorithm? As far as I know this is an open question for research. If autocompletions are read as opinion, then it might be difficult to ever win a defamation case in the U.S. against such an algorithm.

For defamation suits against public figures intent also becomes an important factor to consider. The plaintiff must prove “actual malice” with regards to the defamatory statement, which means that a false statement was published either with actual knowledge of its falsity, or reckless disregard for its falsity. But can an algorithm ever be truly malicious? If you use the argument that autocompletions are just aggregations of what others have already typed in, then actual malice could certainly arise from a group of people systematically manipulating the algorithm. Otherwise, the algorithm would have to have some notion of truth, and be “aware” that it was autocompleting something inconsistent with its knowledge of that truth. This could be especially challenging for things who’s truth changes over time, or for rumors which may have a social consensus but still be objectively false. So while there have been attempts at automating factchecking I think this is a far way off.

Of course this may all be moot under Section 230 of the Communications Decency Act, which states that, “no provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.” Given that search autocompletions are based on queries that real people at one time typed into a search box, it would seem Google has a broad protection under the law against any liability from republishing those queries as suggestions. It’s unclear though, at least to me, if recombining and aggregating data from millions of typed queries can really be considered “re-publishing” or if it should rather be considered publishing anew. I suppose it would depend on the degree of transformation of the input query data into suggestions.

Whether it’s Google’s algorithms creating new snippets of text as autocomplete suggestions, or Narrative Science writing entire articles from data, we’re entering a world where algorithms are synthesizing communications that may in some cases run into moral (or legal) considerations like defamation. In print we call defamation libel; when orally communicated we call it slander. We don’t yet have a word for the algorithmically reconstituted defamation that arises when millions of non-public queries are synthesized and publicly published by an aggregative intermediary. Still, we might try to hold such algorithms to account, by using yet more algorithms to systematically assess and draw human attention to possible breaches of trust. It may be some time yet, if ever, when we can look to the U.S. court system for adjudication.

The Rhetoric of Data

Note: A version of the following also appears on the Tow Center blog.

In the 1830’s abolitionists discovered the rhetorical potential of re-conceptualizing southern newspaper advertisements as data. They “took an undifferentiated pile of ads for runaway slaves, wherein dates and places were of primary importance … and transformed them into data about the routine and accepted torture of enslaved people,” writes Ellen Gruber Garvey in the book Raw Data is an Oxymoron. By creating topical dossiers of ads, the horrors of slavery were catalogued and made accessible for writing abolitionist speeches and novels. The South’s own media had been re-contextualized into a persuasive weapon against itself, a rhetorical tool to bolster the abolitionists’ arguments.

The Latin etymology of “data” means “something given,” and though we’ve largely forgotten that original definition, it’s helpful to think about data not as facts per se, but as “givens” that can be used to construct a variety of different arguments and conclusions; they act as a rhetorical basis, a premise. Data does not intrinsically imply truth. Yes we can find truth in data, through a process of honest inference. But we can also find and argue multiple truths or even outright falsehoods from data.

Take for instance the New York Times interactive, “One Report, Diverging Perspectives,” which wittingly highlights this issue. Shown below, the piece visualizes jobs and unemployment data from two perspectives, emphasizing the differences in how a democrat or a republican might see and interpret the statistics. A rising tide of “data PR” often manifesting as slick and pointed infographics won’t be so upfront about the perspectives being argued though. Advocacy organizations can now collect their own data, or just develop their own arguments from existing data for supporting their cause. What should you be looking out for as a journalist when assessing a piece of data PR? And how can you improve your own data journalism by ensuring the argument you develop is a sound one?

one report diverging perspectives

Contextual journalism—adding interpretation or explanation to a story—can and should be applied to data as much as to other forms of reporting. It’s important because the audience may need to know the context of a dataset in order to fully understand and evaluate the larger story in perspective. For instance, context might include explaining how the data was collected, defined, and aggregated, and what human decision processes contributed to its creation. Increasingly news outlets are providing sidebars or blog posts that fully describe the methodology and context of the data they use in a data-driven story. That way the context doesn’t get in the way of the main narrative but can still be accessed by the inquisitive reader.

In your process it can be useful to ask a series of contextualizing questions about a dataset, whether just critiquing the data, or producing your own story.

Who produced the data and what was their intent? Did it come from a reputable source, like a government or inter-governmental agency such as the UN, or was it produced by a third party corporation with an uncertain source of funding? Consider the possible political or advocacy motives of a data provider as you make inferences from that data, and do some reporting if those motives are unclear.

When was the data collected? Sometimes there can be temporal drift in what data means, how it’s measured, or how it should be interpreted. Is the age of your data relevant to your interpretation? For example, in 2010 the Bureau of Labor Statistics changed the definition of long-term unemployment, which can make it important to recognize that shift when comparing data from before and after the change.

Most importantly it’s necessary to ask what is measured in the data, how was it sampled, and what is ultimately depicted? Are data measurements defined accurately and in a way that they can be consistently measured? How was the data sampled from the world? Is the dataset comprehensive or is it missing pieces? If the data wasn’t randomly sampled how might that reflect a bias in your interpretation? Or have other errors been introduced into the data, for instance through typos or mistaken OCR technology? Is there uncertainty in the data that should be communicated to the reader? Has the data been cropped or filtered in a way that you have lost a potentially important piece of context that would change its interpretation? And what about aggregation or transformation? If a dataset is offered to you with only averages or medians (i.e. aggregations) you’re necessarily missing information about how the data might be distributed, or about outliers that might make interesting stories. For data that’s been transformed through some algorithmic process, such as classification, it can be helpful to know the error rates of that transformation as this can lead to additional uncertainty in the data.

Let’s consider an example that illustrates the importance of measurement definition and aggregation. The Economist graphic below shows the historic and forecast vehicle sales for different geographies. The story the graph tells is pretty clear: Sales in China are rocketing up while they’re declining or stagnant in North America and Europe. But look more closely. The data for Western Europe and North America is defined as an aggregation of light vehicle sales, according to the note in the lower-right corner. How would the story change if the North American data included truck, SUV, and minivan sales? The story you get from these kinds of data graphics can depend entirely on what’s aggregated (or not aggregated) together in the measure. Aggregations can serve as a tool of obfuscation, whether intentional or not.

 vehicle sales

It’s important to recognize and remember that data does not equal truth. It’s rhetorical by definition and can be used for truth finding or truth hiding. Being vigilant in how you develop arguments from data and showing the context that leads to the interpretation you make can only help raise the credibility of your data-driven story.

 

Review: The Functional Art

I don’t often write reviews of books. But I can’t resist offering some thoughts on The Functional Art, a new book by Alberto Cairo aimed at teaching the basics of information graphics and visualization, mostly because I think it’s fantastic, but also because I think there are a few areas where I’d like to see a future edition expound.

Basically I see this as the new default book for teaching journalists how to do infographics and visualization. If you’re a student of journalism, or just interested in developing better visual communication skills I think this book has a ton to offer and is very accessible. But what’s really amazing is that the book also offers a lot to people already in the field (e.g. designers or computer scientists) who want to learn more about the journalistic perspective on visual storytelling. There are nuggets of wisdom sprinkled throughout the book, informed by Cairo’s years of journalism experience. And the diagrams and models of thinking about things like the designer-user relationships or dimensions along which graphics vary adds some much needed structure that forms a framework for thinking about and characterizing information graphics.

Probably the most interesting aspect of the book for someone already doing or studying visualization is the last set of chapters which detail, through a series of interviews with practitioners, how “the sausage is made.” Exposing process in this way is extremely valuable for learning how these things get put together. This exposition continues on the included DVD in which additional production artifacts, sketchs, and mockups form a show-and-tell. And it’s not just about artifacts; the interviews also explore things like how teams are composed in order to facilitate collaborative production.

One of the things I appreciated most about the book is that, in light of its predominant focus on practice, Cairo fearlessly  reads into and then translates research results into practical advice, offering an evidence-based rationale for design decisions. We need more of that kind of thinking, for all sorts of practices.

I have only a few critiques of the book. The first is straightforward: I wish that the book was printed in a larger format because some of the examples shown in the book are screaming for more breathing space. I would have also liked to see the computer science perspective represented a bit more thoroughly in the book – this can for instance serve to enhance and add depth to the discussion about interactivity with visualizations. My only other critique of the book is about critique itself. What I mean is that the idea of critique is sprinkled throughout the book, but I’d almost like to see it elevated to the status of having its own chapter. Learning the skills of critique and the thought process involved is an essential aspect of learning to be a graphics communication intellectual and thoughtful practitioner. And it can and should be taught in a way that students learn a systematic way for thinking and analyzing benefits and tradeoffs. Cairo has the raw material to do this in the book, but I wish it were formalized in some way that lent it the attention it deserves. Such a method could even be illustrated using some of the interviewees’ many examples.

 

Does Local Journalism Need to Be Locally Sustainable?

The last couple of weeks have seen the rallying cries of journalists echo online as they call for support of the Homicide Watch Kickstarter campaign. The tweets “hit the fan” so to speak, Clay Shirky implored us to not let the project die, and David Carr may have finally tipped the campaign with his editorial questioning foundations’ support for Big News at the expense of funding more nimble start-ups like Homicide Watch.

It seems like a good idea too – providing more coverage of a civically important issue – and one that’s underserved to boot. But is it sustainable? As Jeff Sonderman at Poynter wrote about the successful Kickstarter campaign, “The $40,000 is not a sustainable endowment, just a stopgap to fund intern staffing for one year.”

For Homicide Watch to be successful at franchising to other cities (i.e. by selling a platform) each of those franchises itself needs to be sustained. This implies that, on a local level, either enough advertising buy-in, local media support, or crowdfunding (a la Kickstarter) would need to be generated to pay those pesky labor costs, the most expensive cost in most content businesses.

Here’s the thing. Even though Homicide Watch was funded, it struggled to get there, mostly surviving on the good-natured altruism of the media elite. I doubt that local franchises will be able to repeat that trick. Here’s why: most of the donors who gave to Homicide Watch were from elsewhere in the U.S. (68%) or from other countries (10%). Only  22% of donors where from DC, Virginia, or Maryland (see below for details on where the numbers come from). But this means that people local to Washington, DC, those who ostensibly would have the most to gain from a project like this, barely made up more than a fifth of the donors. Other local franchises probably couldn’t count on the kind of national attention that the media elite brought to the Homicide Watch funding campaign, nor could they count on the national interest afforded to the nation’s capital.

You might argue that for something like this to flourish it needs local support, from the people who would get the real utility of the innovation. At least Homicide Watch got a chance to prove itself out, but we’ll have to wait to see if it can make a sustainable business and provide real information utility at a local level. The numbers at this stage would seem to suggest it’s got an uphill battle ahead of it.

Stats
Here’s how I got the stats I quoted above. I made a Scraper wiki script to collect all of the donors on the Homicide Watch Kickstarter page (there were 1,102 as of about noon on 9/12). Of those 1102, 270 donors had geographic information (city, state, country). The stats quoted above are based on those 270 geotagged donors. Of course, that’s only about 25% of the total donors, so an assumption that I make above is that the 75%, the non-geotagged donors, follow a similar geographic distribution (and donation magnitude distribution) as the geotagged ones. I can’t think of a reason that assumption might not be true. For kicks I put the data up on Google Fusion Tables (it’s so awful, please, someone fix that!) so here’s a map of what states donors come from.

Fact-Checking at Scale

Note: this is cross-posted on the CUNY Tow-Knight Center for Entrepreneurial Journalism site. 

Over the last decade there’s been a substantial growth in the use of Fact-Checking to correct misinformation in the public sphere. Outlets like Factcheck.org and Politifact tirelessly research and assess the accuracy of all kinds of information and statements from politicians or think-tanks. But a casual perusal of these sites shows that there are usually only 1 or 2 fact-checks per day from any given outlet. Fact-Checking is an intensive research process that demands considerable skilled labor and careful consideration of potentially conflicting evidence. In a task that’s so labor intensive, how can we scale it so that the truth is spread far and wide?

Of late, Politifact has expanded by franchising its operations to states – essentially increasing the pool of trained professionals participating in fact-checking. It’s a good strategy, but I can think of at least a few others that would also grow the fact-checking pie: (1) sharpen the scope of what’s fact-checked so that attention is where it’s most impactful, (2) make use of volunteer, non-professional labor via crowdsourcing, and (3) automate certain aspects of the task so that professionals can work more quickly. In the rest of this post, I’ll flesh out each of these approaches in a bit more detail.

Reduce Fact-Checking Scope
“I don’t get to decide which facts are stupid … although it would certainly save me a lot of time with this essay if I were allowed to make that distinction.” argues Jim Fingal in his epic fact-check struggle with artist-writer John D’Agata in The Lifespan of a Fact. Indeed, some of the things Jim checks are really absurd: did the subject take the stairs or the elevator, did he eat “potatoes” or “french fries”; these things don’t matter to the point of that essay, nor, frankly, to me as the reader.

Fact-checkers, particularly the über-thorough kind employed by magazines, are tasked with assessing the accuracy of every claim or factoid written in an article (See the Fact Checker’s Bible for more). This includes hard facts like names, stats, geography, and physical properties as well as what sources claim via a quotation, or what the author writes from notes. Depending on the nature of the claim some of it may be subjective, opinion-based, or anecdotal. All of this checking is meant to protect the reputation of the publication and of the writers. To maintain trust with the public. But it’s a lot to check and the imbalance between content volume and critical attention will only grow.

To economize their attention fact-checkers might better focus on overall quality; who cares if they’re “potatoes” or “french fries”? In information science studies, the notion of quality can be defined as the “value or ‘fitness’ of the information to a specific purpose or use.” If quality is really what we’re after then fact-checking would be well-served and more efficacious if it focused the precious attention of fact-checkers on claims that have some utility. These are the claims that if they were false could impact the outcome of some event or an important decision. I’m not saying accuracy doesn’t matter, it does, but fact-checkers might focus more energy on information that impacts decisions. For health information this might involve spending more time researching claims that impact health-care options and choices; for finance it would involve checking information informing decisions about portfolios and investments. And for politics this involves checking information that is important for people’s voting decisions – something that the likes of Politifact already focus on.

Increased Use of Volunteer Labor
Another approach to scaling fact-checking is to incorporate more non-professionals, the crowd, in the truth-seeking endeavor. This is something often championed by social media journalists like Andy Carvin, who see truth-seeking as an open process that can involve asking for (and then vetting) information from social media participants. Mathew Ingram has written about how platforms like Twitter and Reddit can act as crowdsourced fact-checking platforms. And there have been several efforts toward systematizing this, notably the TruthSquad, which invited readers to post links to factual evidence that supports or opposes a single statement. A professional journalist would then write an in-depth report based on their own research plus whatever research the crowd contributed. I will say I’m impressed with the kind of engagement they got, though sadly it’s not being actively run anymore.

But it’s important to step back and think about what the limitations of the crowd in this (or any) context really are. Graves and Glaisyer remind us that we still don’t really know how much an audience can contribute via crowdsourced fact-checking. Recent information quality research by Arazy and Kopak gives us some clues about what dimensions of quality may be more amenable to crowd contributions. In their study they looked at how consistent ratings of various wikipedia articles were along dimensions of accuracy, completeness, clarity, and objectivity. They found that, while none of these dimensions had particularly consistent ratings, completeness and clarity were more reliable than objectivity or accuracy. This is probably because it’s easier to use a heuristic or shortcut to assess completeness, whereas rating accuracy requires specialized knowledge or research skill. So, if we’re thinking about scaling fact-checking with a pro-am model we might have the crowd focus on aspects of completeness and clarity, but leave the difficult accuracy work to the professionals.

#Winning with Automation
I’m not going to fool anyone by claiming that automation or aggregation will fully solve the fact-checking scalability problem. But there may be bits of it that can be automated, at least to a degree where it would make the life of a professional fact-checker easier or make their work go faster. An automated system could allow any page online to be quickly checked for misinformation. Violations could be flagged and highlighted, either for lack of corroboration or for controversy, or the algorithm could be run before publication so that a professional fact-checker could take a further crack at it.

Hypothetical statements, opinions and matters of taste, or statements resting on complex assumptions may be too hairy for computers to deal with. But we should be able to automatically both identify and check hard-facts and other things that are easily found in reference materials. The basic mechanic would be one of corroboration, a method often used by journalists and social scientists in truth-seeking. If we can find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is key, since political, monetary, legal, or other connections can taint or at least place contingencies on the value of corroborated information.

There have already been a handful of efforts in the computing research literature that have looked at how to do algorithmic corroboration. But there is still work to do to define adequate operationalizations so that computers can do this effectively. First of all, we need to define, identify, and extract the units that are to be corroborated. Computers need to be able to differentiate a factually stated claim from a speculative or hypothetical one, since only factual claims can really be meaningfully corroborated. In order to aggregate statements we then need to be able to match two claims together while taking into account different ways of saying similar things. This includes the challenge of context, the tiniest change in which can alter the meaning of a statement and make it difficult for a computer to assess the equivalence of statements. Then, the simplest aggregation strategy might consider the frequency of a statement as a proxy for its truth-value (the more sources that agree with statement X, the more we should believe it) but this doesn’t take into the account the credibility of the source or their other relationships, which also need to be enumerated and factored in. We might want algorithms to consider other dimensions such as the relevance and expertise of the source to the claim, the source’s originality (or lack thereof), the prominence of the claim in the source, and the source’s spatial or temporal proximity to the information. There are many challenges here!

Any automated corroboration method would rely on a corpus of information that acts as the basis for corroboration. Previous work like DisputeFinder has looked at scraping or accessing known repositories such as Politifact or Snopes to jump-start a claims database, and other work like Videolyzer has tried to leverage engaged people to provide structured annotations of claims. Others have proceeded by using the internetas a massive corpus. But there could also be an opportunity here for news organizations, who already produce and have archives of lots of credible and trustworthy text (e.g. rigorously fact-checked magazines), to provide a corroboration service based on all of the claims embedded in those texts. Could news organizations even make money by syndicating their archives like this?

There are of course other challenges to fact-checking that also need to be surmounted, such as the user-interface for presentation or how to effectively syndicate fact-checks across different media. In this essay I’ve argued that scale is one of the key challenges to fact-checking. How can we balance scope with professional, non-professional, and computerized labor to get closer to the truth that really matters?

 

Tweaking Your Credibility on Twitter

You want to be credible on social media, right? Well, a paper to be published at the Conference on Computer Supported Cooperative Work (CSCW) in early 2012 from researchers at Microsoft and Carnegie Mellon suggests at least a few actionable methods to help you do so. The basic motivation for the research is that when people see your tweet via a search (rather than following you) they have less cues to assess credibility. With a better understanding of what factors influence tweet credibility, new search interfaces can be designed to highlight the most relevant credibility cues (now you see why Microsoft is interested).

First off, five people were interviewed by the researchers to collect a range of issues that might be relevant to credibility perception. They came up with a list of 26 possible credibility cues and then ran a survey with 256 respondents in which they asked how much each feature impacted credibility perception. You can see the paper for the full results, but, for instance, things like keeping your tweets on a similar topic, using a personal photo, having a username related to the topic, having a location near a topic, having a bio that suggests relavent topical expertise, and frequent tweeting were all perceived by participants to positively impact credibility to some extent. Things like using non-standard grammar and punctuation, using the default user image were seen to detract from credibility.

Based on their first survey, the researchers then focused on three specific credibility cues for a follow-on study: (1) topic of tweets (politics, science, or entertainment), (2) user name style (first_last, internet – “tenacious27″, and topical – “AllPolitics”), and finally (3) user image (male / female photo, topical icon, generic icon, and default). For the study, each participant (there were 266) saw some combination of the above cues for a tweet, and rated both tweet credibility and author credibility. Unsurprisingly tweets about the science topic were rated as more credible than those on politics or entertainment. The most surprising result to me was that topically relevant user names were more credible than traditional names (or internet style names, though that’s not surprising). In a final follow-up experiment the researchers found that the user image doesn’t impact credibility perceptions, except for when the image is the default image in which case it significantly (in the statistical sense) lowers perceptions of tweet credibility.

So here are the main actionable take-aways:

  • Don’t use non standard grammar and punctuation (no “lol speak”)
  • Don’t use the default image.
  • Tweet about topics like science, which seem to carry an aura of credibility.
  • Find a user name that is topically aligned with those you want to reach.
That last point of finding a topically aligned user name might be an excellent strategy for large news organizations to build a more credible presence across a range of topics. For instance, right now the NY Times has a mix of accounts that have topical user names, as well as reporters using their real names. In addition to each reporter having their own “real name” account, individual tweets of theirs that were topically relevant could be routed to the appropriate topically named account. So for instance, let’s say Andy Revkin tweets something about the environment. That tweet should also show up via the Environment account, since the tweet may be perceived as having higher credibility from a topically-related user name. For people who search and find that tweet, of course if they know who Andy Revkin is, then they’ll find his tweet quite credible since he’s known for having that topical expertise. But for someone else who doesn’t know who Andy Revkin is, the results of the above study suggest that that person would find the same content more credible coming from the topically related Environment account. Maybe the Times or others are already doing this. But if not, it seems like there’s an opportunity to systematically increase credibility by adopting such an approach.

Designing Tools for Journalism

Whether you’re designing for professionals or amateurs, for people seeking to reinvigorate institutions or to invent new ones, there are still core cultural values ensconced in journalism that can inspire and guide the design of new tools, technologies, and algorithms for committing acts of journalism. How can we preserve the best of such values in new technologies? One approach is known as value sensitive design and attempts to account for human values in a comprehensive manner throughout the design process by identifying stakeholders, benefits, values, and value conflicts to help designers prioritize features and capabilities.

“Value” is defined as “what a person or group of people consider important in life”. Values could include things like privacy, property rights, autonomy, and accountability among other things. What does journalism value? If we can answer that question, then we should be able to design tools for professional journalists that are more easily adopted (“This tool makes it easy to do the things I find important and worthwhile!”), and we should be able to design tools that more easily facilitate acts of journalism by non-professionals (“This tool makes it easy to participate in a meaningful and valuable way with a larger news process!”). Value sensitive design espouses consideration of all stakeholders (both direct and indirect) when designing technology. I’ve covered some of those stakeholders in a previous post on what news consumers want, but another set of stakeholders would be those relating to the business model (e.g. advertisers). In any case, mismatches between the values and needs of different stakeholders will lead to conflicts that need to be resolved by identifying benefits and prioritizing features.

When we turn to normative descriptions of journalism, such as Kovach and Rosenstiel’s The Elements of Journalism and Blur, Schudson’s The Sociology of News, or descriptions of ethics principles from the AP or ASNE, we find both core values, as well as valued activities. It’s easiest to understand these as ideals which are not always met in practice. Some core values include:

  • Truth: including a commitment to accuracy, verification, transparency, and putting things in context
  • Independence: from influence by those they cover, from politics, from corporations, or from others they seek to monitor
  • Citizen-first: on the side of the citizen rather than for corporations or political factions
  • Impartial: except when opinion has been clearly marked
  • Relevance: to provide engaging and enlightening information

Core values also inform valued activities or roles, such as:

  • Informer: giving people the information they need or want about contemporary affairs of public interest
  • Watchdog: making sure powerful institutions or individuals are held to account (also called “accountability journalism”)
  • Authenticator: assessing the truth-value of claims (“factchecking”); also relates to watchdogging
  • Forum Organizer: orchestrating a public conversation, identifying and consolidating community
  • Aggregator: collecting and curating information to make it accessible
  • Sensemaker: connecting the dots and making relationships salient

Many of these values and valued activities can be seen from an information science perspective as contributing to information quality, or the degree of excellence in communicating knowledge. I’ll revisit the parallels to information science in a future post.

Besides core values and valued activities, there are other, perhaps more abstract, processes which are essential to producing journalism, like information gathering, organization and sensemaking, communication and presentation, and dissemination. Because they’re more abstract these processes have a fair amount of variability as they are adapted for different milieu (e.g. information gathering on social media) or media (e.g. text, image, video, games). Often valued activities are already the composition of several of these underlying information processes that have been infused with core values. We should be on the lookout for “new” valued activities waiting for products to emerge around them, for instance, by considering more specific value-added information processes in conjunction with core values.

There’s a lot of potential for technology to re-invent and re-imagine valued activities and abstract information processes in light of core values: to make them more effective, efficient, satisfying, productive, and usable. Knowing the core values also helps designers understand what would not be acceptable to design for professionals (e.g. a platform to facilitate the acquisition of paid sources would probably not be adopted in the U.S.). I would argue that it’s the function that is served by the above valued activities, and not the institutionalized practices that are currently used to accomplish them, that is fundamentally important to consider for designers. While we should by all means consider designs that adhere to core values and to an understanding of the outputs of valued activities, we should also be open to allowing technology to enhance the processes and methods which get us there. Depending on whether you’re innovating in an institutional setting or in an unencumbered non-institutional environment you have different constraints, but, irregardless I maintain that value sensitive design is a good way forward to ensure that future tools for journalism will be more trustworthy, have more impact, and resonate more with the public.

Unpacking Visualization Rhetoric

Note: An edited version of the following also appears on the Chart.io blog. 

Visualization can be useful for both more exploratory purposes (e.g. generating analyses and insights based on data) as well as more communicative ends (e.g. helping other people understand and be persuaded or informed by the insights that you’ve uncovered). Oftentimes more general visualization techniques are used in the exploratory phase, whereas more specific, tailored, and hand-crafted techniques (like infographics) tend to be preferred for maximal persuasive potential in the communicative phase.

In the middle ground is a class of visualizations termed “narrative visualization” – often used in journalism contexts – which tend to include aspects of both exploratory and communicative visualization. This blending of techniques makes for an interesting domain of study and it’s here where Jessica Hullman and I began investigating how different rhetorical (persuasive) techniques are employed in visualization. We were particularly interested in how different rhetorical techniques can be used to affect the interpretation of a visualization – valuable knowledge for visualization designers hoping to influence and mold the interpretation of their audience. (Here we defer the sticky ethical question of whether someone should use these techniques since in general they can be used for both good and ill).

We carefully analyzed 51 narrative visualizations and constructed a taxonomy of rhetorical techniques we found being used. We observed rhetorical techniques being employed at four different editorial layers of a visualization: data, visual representation, annotations, and interactivity. Choices at any of these layers can have important implications for the ultimate interpretation of a visualization (e.g. the design of available interactivity can direct or divert attention). The five main classes of rhetoric we found being used include: information access (e.g. how data is omitted or aggregated), provenance (e.g. how data sources are explained and how uncertainty is shown), mapping (e.g. the use of visual metaphor), linguistic techniques (e.g. irony or apostrophe), and procedural rhetoric (e.g. how default views anchor interpretation).

The maxim “know thy audience” points to another dimension by which a visualization creator can influence the interpretation of a visualization. While most visualizations concentrate on the denotative level of communication, the most effective visualization communicators also make use of the connotative level of communication to unlock a whole other plane of interpretation. For instance, various cultural codes (e.g. what colors mean), or conventions (e.g. line graphs suggest you’re looking at temporal data even if you’re not) can suggest alternate or preferred interpretations.

While the full explanation of the taxonomy and use of codes and connotation for communication in visualization is beyond this blog post, you can see a more complete discussion in a pre-print of our forthcoming InfoVis paper.  At the very least though I’ll leave you with an example which illustrates some of these concepts.

Take the following recent example from the New York Times where various aspects of the visualization rhetoric framework apply.

The choice of labeling on the dimensions of the chart “reduce spending” vs. “don’t reduce spending” leaves out another option, “increase spending”. The choice of the color green for “willing to compromise” connotes a certain value judgement (i.e. “go, or move ahead”) as read from an American perspective. The way individual squares are aggregated to arrive at an overall color is unclear, leading to questions that could be clarified through better use of provenance rhetoric. Moreover, squares cannot be disaggregated or understood as individual data, making it difficult for users to interpret either the magnitude of the response or the specific data reported in any one square. While compelling, applying the visualization rhetoric framework during the design of this visualization could have suggested other ways to make the interpretation of the visualization more clear.

Ultimately visualization rhetoric is a framework that can be useful for designers hoping to maximize the communicative potential of a visualization. Exploratory visualization platforms (like Tableau or Chart.io) could also be enhanced with an awareness of visualization rhetoric, by, for instance, allowing users to make salient use of certain rhetorical techniques when the time comes to share a visualization.

Those particularly interested in this space should consider participating in an upcoming workshop I am co-organizing on “Telling Stories with Data” at InfoVis 2011 in Providence, RI in late October.

Visualization, Data, and Social Media Response

I’ve been looking into how people comment on data and visualization recently and one aspect of that has been studying the Guardian’s Datablog. The Datablog publishes stories of and about data, oftentimes including visualizations such as charts, graphs, or maps. It also has a fairly vibrant commenting community.

So I set out to gather some of my own data. I scraped 803 articles from the Datablog including all of their comments. Of this data I wanted to know if articles which contained embedded data tables or embedded visualizations produced more of a social media response. That is, do people talk more about the article if it contains data and/or visualization? The answer is yes, and the details are below.

While the number of comments could be scraped off of the Datablog site itself I turned to Mechanical Turk to crowdsource some other elements of metadata collection: (1) the number of tweets per article, (2) whether the article has an embedded data table, and (3) whether the article has an embedded visualization. I did a spot check on 3% of the results from Turk in order to assess the Turkers’ accuracy on collecting these other pieces of metadata: it was about 96% overall, which I thought was clean enough to start doing some further analysis.

So next I wanted to look at how the “has visualization” and “has table” features affect (1) tweet volume, and (2) comment volume. There are four possibilities: the article has (1) a visualization and a table, (2) a visualization and no table, (3) no visualization and a table, (4) no visualization and no table. Since both the tweet volume and comment volume are not normally distributed variables I log transformed them to get them to be normal (this is an assumption of the following statistical tests). Moreover, there were a few outliers in the data and so anything beyond 3 standard deviations from the mean of the log transformed variables was not considered.

For number of tweets per article:

  1. Articles with both a visualization and a table produced the largest response with an average of 46 tweets per article (N=212, SD=103.24);
  2. Articles with a visualization and no table produced an average of 23.6 tweets per article (N=143, SD=85.05);
  3. Articles with no visualization and a table produced an average of 13.82 tweets per article (N=213, SD=42.7);
  4. And finally articles with neither visualization nor table produced an average of 19.56 tweets per article (N=117, SD=86.19).

I ran an ANOVA with post-hoc Bonferroni tests to see if these means were significant. Articles with both a visualization and a table (case 1) have a significantly higher number of tweets than cases 3 (p < .01) and 4 (p < .05). Articles with just the visualization and no data table have a higher number of average tweets per article, but this was not statistically significant. The take away is that it seems that the combination of a visualization and a data table drives a significantly higher twitter response.

Results for number of comments per article are similar:

  1. Articles with both a visualization and a table produced the largest response with an average of 17.40 comments per article (SD=24.10);
  2. Articles with a visualization and no table produced an average of 12.58 comments per article (SD=17.08);
  3. Articles with no visualization and a table produced an average of 13.78 comments per article (SD=26.15);
  4. And finally articles with neither visualization nor table produced an average of 11.62 comments per article (SD=17.52)

Again with the ANOVA and post-hoc Bonferroni tests to assess statistically significant differences between means. This time there was only one statistically significant difference: Articles with both a visualization and a table (case 1) have a higher number of comments than articles with neither a visualization nor a table (case 4). The p value was 0.04. Again, the combination of visualization and data table drove more of an audience response in terms of commenting behavior.

The overall take-away here is that people like to talk about articles (at least in the context of the audience of the Guardian Datablog) when both data and visualization are used to tell the story. Articles which used both had more than twice the number of tweets and about 1.5 times the number of comments versus articles which had neither. If getting people talking about your reporting is your goal, use more data and visualization, which, in retrospect, I probably also should have done for this blog post.

As a final thought I should note there are potential confounds in these results. For one, articles with data in them may stay “green” for longer thus slowly accreting a larger and larger social media response. One area to look at would be the acceleration of commenting in addition to volume. Another thing that I had no control over is whether some stories are promoted more than others: if the editors at the Guardian had a bias to promote articles with both visualizations and data then this would drive the audience response numbers up on those stories too. In other words, it’s still interesting and worthwhile to consider various explanations for these results.