Category Archives: information quality

Fact-Checking at Scale

Note: this is cross-posted on the CUNY Tow-Knight Center for Entrepreneurial Journalism site. 

Over the last decade there’s been a substantial growth in the use of Fact-Checking to correct misinformation in the public sphere. Outlets like Factcheck.org and Politifact tirelessly research and assess the accuracy of all kinds of information and statements from politicians or think-tanks. But a casual perusal of these sites shows that there are usually only 1 or 2 fact-checks per day from any given outlet. Fact-Checking is an intensive research process that demands considerable skilled labor and careful consideration of potentially conflicting evidence. In a task that’s so labor intensive, how can we scale it so that the truth is spread far and wide?

Of late, Politifact has expanded by franchising its operations to states – essentially increasing the pool of trained professionals participating in fact-checking. It’s a good strategy, but I can think of at least a few others that would also grow the fact-checking pie: (1) sharpen the scope of what’s fact-checked so that attention is where it’s most impactful, (2) make use of volunteer, non-professional labor via crowdsourcing, and (3) automate certain aspects of the task so that professionals can work more quickly. In the rest of this post, I’ll flesh out each of these approaches in a bit more detail.

Reduce Fact-Checking Scope
“I don’t get to decide which facts are stupid … although it would certainly save me a lot of time with this essay if I were allowed to make that distinction.” argues Jim Fingal in his epic fact-check struggle with artist-writer John D’Agata in The Lifespan of a Fact. Indeed, some of the things Jim checks are really absurd: did the subject take the stairs or the elevator, did he eat “potatoes” or “french fries”; these things don’t matter to the point of that essay, nor, frankly, to me as the reader.

Fact-checkers, particularly the über-thorough kind employed by magazines, are tasked with assessing the accuracy of every claim or factoid written in an article (See the Fact Checker’s Bible for more). This includes hard facts like names, stats, geography, and physical properties as well as what sources claim via a quotation, or what the author writes from notes. Depending on the nature of the claim some of it may be subjective, opinion-based, or anecdotal. All of this checking is meant to protect the reputation of the publication and of the writers. To maintain trust with the public. But it’s a lot to check and the imbalance between content volume and critical attention will only grow.

To economize their attention fact-checkers might better focus on overall quality; who cares if they’re “potatoes” or “french fries”? In information science studies, the notion of quality can be defined as the “value or ‘fitness’ of the information to a specific purpose or use.” If quality is really what we’re after then fact-checking would be well-served and more efficacious if it focused the precious attention of fact-checkers on claims that have some utility. These are the claims that if they were false could impact the outcome of some event or an important decision. I’m not saying accuracy doesn’t matter, it does, but fact-checkers might focus more energy on information that impacts decisions. For health information this might involve spending more time researching claims that impact health-care options and choices; for finance it would involve checking information informing decisions about portfolios and investments. And for politics this involves checking information that is important for people’s voting decisions – something that the likes of Politifact already focus on.

Increased Use of Volunteer Labor
Another approach to scaling fact-checking is to incorporate more non-professionals, the crowd, in the truth-seeking endeavor. This is something often championed by social media journalists like Andy Carvin, who see truth-seeking as an open process that can involve asking for (and then vetting) information from social media participants. Mathew Ingram has written about how platforms like Twitter and Reddit can act as crowdsourced fact-checking platforms. And there have been several efforts toward systematizing this, notably the TruthSquad, which invited readers to post links to factual evidence that supports or opposes a single statement. A professional journalist would then write an in-depth report based on their own research plus whatever research the crowd contributed. I will say I’m impressed with the kind of engagement they got, though sadly it’s not being actively run anymore.

But it’s important to step back and think about what the limitations of the crowd in this (or any) context really are. Graves and Glaisyer remind us that we still don’t really know how much an audience can contribute via crowdsourced fact-checking. Recent information quality research by Arazy and Kopak gives us some clues about what dimensions of quality may be more amenable to crowd contributions. In their study they looked at how consistent ratings of various wikipedia articles were along dimensions of accuracy, completeness, clarity, and objectivity. They found that, while none of these dimensions had particularly consistent ratings, completeness and clarity were more reliable than objectivity or accuracy. This is probably because it’s easier to use a heuristic or shortcut to assess completeness, whereas rating accuracy requires specialized knowledge or research skill. So, if we’re thinking about scaling fact-checking with a pro-am model we might have the crowd focus on aspects of completeness and clarity, but leave the difficult accuracy work to the professionals.

#Winning with Automation
I’m not going to fool anyone by claiming that automation or aggregation will fully solve the fact-checking scalability problem. But there may be bits of it that can be automated, at least to a degree where it would make the life of a professional fact-checker easier or make their work go faster. An automated system could allow any page online to be quickly checked for misinformation. Violations could be flagged and highlighted, either for lack of corroboration or for controversy, or the algorithm could be run before publication so that a professional fact-checker could take a further crack at it.

Hypothetical statements, opinions and matters of taste, or statements resting on complex assumptions may be too hairy for computers to deal with. But we should be able to automatically both identify and check hard-facts and other things that are easily found in reference materials. The basic mechanic would be one of corroboration, a method often used by journalists and social scientists in truth-seeking. If we can find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is key, since political, monetary, legal, or other connections can taint or at least place contingencies on the value of corroborated information.

There have already been a handful of efforts in the computing research literature that have looked at how to do algorithmic corroboration. But there is still work to do to define adequate operationalizations so that computers can do this effectively. First of all, we need to define, identify, and extract the units that are to be corroborated. Computers need to be able to differentiate a factually stated claim from a speculative or hypothetical one, since only factual claims can really be meaningfully corroborated. In order to aggregate statements we then need to be able to match two claims together while taking into account different ways of saying similar things. This includes the challenge of context, the tiniest change in which can alter the meaning of a statement and make it difficult for a computer to assess the equivalence of statements. Then, the simplest aggregation strategy might consider the frequency of a statement as a proxy for its truth-value (the more sources that agree with statement X, the more we should believe it) but this doesn’t take into the account the credibility of the source or their other relationships, which also need to be enumerated and factored in. We might want algorithms to consider other dimensions such as the relevance and expertise of the source to the claim, the source’s originality (or lack thereof), the prominence of the claim in the source, and the source’s spatial or temporal proximity to the information. There are many challenges here!

Any automated corroboration method would rely on a corpus of information that acts as the basis for corroboration. Previous work like DisputeFinder has looked at scraping or accessing known repositories such as Politifact or Snopes to jump-start a claims database, and other work like Videolyzer has tried to leverage engaged people to provide structured annotations of claims. Others have proceeded by using the internetas a massive corpus. But there could also be an opportunity here for news organizations, who already produce and have archives of lots of credible and trustworthy text (e.g. rigorously fact-checked magazines), to provide a corroboration service based on all of the claims embedded in those texts. Could news organizations even make money by syndicating their archives like this?

There are of course other challenges to fact-checking that also need to be surmounted, such as the user-interface for presentation or how to effectively syndicate fact-checks across different media. In this essay I’ve argued that scale is one of the key challenges to fact-checking. How can we balance scope with professional, non-professional, and computerized labor to get closer to the truth that really matters?

 

Moving Towards Algorithmic Corroboration

Note: this is cross-posted on the Berkman/MIT “Truthiness in Digital Media” blog

One of the methods that truth seekers like journalists or social scientists often employ is corroboration. If we find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is key, since political, monetary, legal, or other connections can taint or at least place contingencies on the value of corroborated information.

How can we scale this idea to the web by teaching computers to effectively corroborate information claims online? An automated system could allow any page online to be quickly checked for misinformation. Violations could be flagged and highlighted, either for lack of corroboration or for a multi-faceted corroboration (i.e. a controversy).

There have already been a handful of efforts in the computing research literature that have looked at how to do algorithmic corroboration. But there is still work to do to define adequate operationalizations so that computers can be effective corroborators. First of all, we need to define and extract the units that are to be corroborated. Computers need to be able to differentiate a factually stated claim from a speculative or hypothetical one, since only factual claims can really be meaningfully corroborated. In order to aggregate statements we then need to be able to match two claims together while taking into account different ways of saying similar things. This includes the challenge of context, the tiniest change in which can alter the meaning of a statement and make it difficult for a computer to assess the equivalence of statements. Then, the simplest aggregation strategy might consider the frequency of a statement as a proxy for its truth-value (the more sources that agree with statement X, the more we should believe it) but this doesn’t take into the account the credibility of the source or their other relationships, which also need to be enumerated and factored in. We might want algorithms to consider other dimensions such as the relevance and expertise of the source to the claim, the source’s originality (or lack thereof), the prominence of the claim in the source, and the source’s spatial or temporal proximity to the information. There are many research challenges here!

Any automated corroboration method would rely on a corpus of information that acts as the basis for corroboration. Previous work like DisputeFinder has looked at scraping known repositories such as Politifact or Snopes to jump-start a claims database, and other work like Videolyzer has tried to leverage engaged people to provide structured annotations of claims, though it’s difficult to get enough coverage and scale through manual efforts. Others have proceeded by using the internet as a massive corpus. But there could also be an opportunity here for news organizations, who already produce and have archives of lots of credible and trustworthy text, to provide a corroboration service based on all of the claims embedded in those texts. A browser plugin could detect and highlight claims that are not corroborated by e.g. the NYT or Washington Post corpora. Could news organizations even make money off their archives like this?

It’s important not to forget that there are limits to corroboration too, both practical and philosophical. Hypothetical statements, opinions and matters of taste, or statements resting on complex assumptions may not benefit at all from a corroborative search for truth. Moreover, systemic bias can still go unnoticed, and a collective social mirage can guide us toward fonts of hollow belief when we drop our critical gaze. We’ll still need smart people around, but, I would argue, finding effective ways to automate corroboration would be a huge advance and a boon in the fight against a misinformed public.

Designing Tools for Journalism

Whether you’re designing for professionals or amateurs, for people seeking to reinvigorate institutions or to invent new ones, there are still core cultural values ensconced in journalism that can inspire and guide the design of new tools, technologies, and algorithms for committing acts of journalism. How can we preserve the best of such values in new technologies? One approach is known as value sensitive design and attempts to account for human values in a comprehensive manner throughout the design process by identifying stakeholders, benefits, values, and value conflicts to help designers prioritize features and capabilities.

“Value” is defined as “what a person or group of people consider important in life”. Values could include things like privacy, property rights, autonomy, and accountability among other things. What does journalism value? If we can answer that question, then we should be able to design tools for professional journalists that are more easily adopted (“This tool makes it easy to do the things I find important and worthwhile!”), and we should be able to design tools that more easily facilitate acts of journalism by non-professionals (“This tool makes it easy to participate in a meaningful and valuable way with a larger news process!”). Value sensitive design espouses consideration of all stakeholders (both direct and indirect) when designing technology. I’ve covered some of those stakeholders in a previous post on what news consumers want, but another set of stakeholders would be those relating to the business model (e.g. advertisers). In any case, mismatches between the values and needs of different stakeholders will lead to conflicts that need to be resolved by identifying benefits and prioritizing features.

When we turn to normative descriptions of journalism, such as Kovach and Rosenstiel’s The Elements of Journalism and Blur, Schudson’s The Sociology of News, or descriptions of ethics principles from the AP or ASNE, we find both core values, as well as valued activities. It’s easiest to understand these as ideals which are not always met in practice. Some core values include:

  • Truth: including a commitment to accuracy, verification, transparency, and putting things in context
  • Independence: from influence by those they cover, from politics, from corporations, or from others they seek to monitor
  • Citizen-first: on the side of the citizen rather than for corporations or political factions
  • Impartial: except when opinion has been clearly marked
  • Relevance: to provide engaging and enlightening information

Core values also inform valued activities or roles, such as:

  • Informer: giving people the information they need or want about contemporary affairs of public interest
  • Watchdog: making sure powerful institutions or individuals are held to account (also called “accountability journalism”)
  • Authenticator: assessing the truth-value of claims (“factchecking”); also relates to watchdogging
  • Forum Organizer: orchestrating a public conversation, identifying and consolidating community
  • Aggregator: collecting and curating information to make it accessible
  • Sensemaker: connecting the dots and making relationships salient

Many of these values and valued activities can be seen from an information science perspective as contributing to information quality, or the degree of excellence in communicating knowledge. I’ll revisit the parallels to information science in a future post.

Besides core values and valued activities, there are other, perhaps more abstract, processes which are essential to producing journalism, like information gathering, organization and sensemaking, communication and presentation, and dissemination. Because they’re more abstract these processes have a fair amount of variability as they are adapted for different milieu (e.g. information gathering on social media) or media (e.g. text, image, video, games). Often valued activities are already the composition of several of these underlying information processes that have been infused with core values. We should be on the lookout for “new” valued activities waiting for products to emerge around them, for instance, by considering more specific value-added information processes in conjunction with core values.

There’s a lot of potential for technology to re-invent and re-imagine valued activities and abstract information processes in light of core values: to make them more effective, efficient, satisfying, productive, and usable. Knowing the core values also helps designers understand what would not be acceptable to design for professionals (e.g. a platform to facilitate the acquisition of paid sources would probably not be adopted in the U.S.). I would argue that it’s the function that is served by the above valued activities, and not the institutionalized practices that are currently used to accomplish them, that is fundamentally important to consider for designers. While we should by all means consider designs that adhere to core values and to an understanding of the outputs of valued activities, we should also be open to allowing technology to enhance the processes and methods which get us there. Depending on whether you’re innovating in an institutional setting or in an unencumbered non-institutional environment you have different constraints, but, irregardless I maintain that value sensitive design is a good way forward to ensure that future tools for journalism will be more trustworthy, have more impact, and resonate more with the public.

Videolyzing Pharmaceutical Ads

There are just two countries in the world where Direct-To-Consumer (DTC) advertising is allowed for pharmaceuticals: the US and New Zealand. The ostensible motivation? To educate consumers, to raise awareness of medical conditions, to get people talking to their doctors, or to reduce the stigma associated with certain conditions (e.g. Viagra)

Since the laws changed back in 1997 in the US opening the floodgates for big pharma to peddle their wares directly to patients, there has been a debate about the efficacy and value of DTC advertising. Even today the FDA lists several ongoing studies evaluating the understandability and effects of DTC advertising. But the debate is political too. Congress has recently started floating proposals to limit the marketing powers of pharmaceutical companies for the first 2 years after a drug has been approved by the FDA. This would give regulators additional time to evaluate a new drug’s broader risks once it were available on the market.

Drugs aren’t the only DTC advertising issue generating controversy either. DTC medical device advertising is already generating a debate about the ethics of advertising products to people that can’t possibly understand the medical risks and decisions necessary for a medical device implant.

This is not to mention that DTC could be pushing up the overall costs of health care by directing people toward brand name “designer” drugs that may not be any more effective than alternative treatments. Obama’s $1 billion stimulus funding for Comparative Effectiveness Research (CER) should help with this somewhat by doing real comparisons of which treatments are “worth it” both in $$$ and patient value.

But big pharma is big business. Huge sums of money are invested in pharameutical advertising ($5.2 Billion in 2007), with spending growing at an annual rate of about 20% from 1997 to 2005. And with huge returns on investment, who can blame big pharma for wanted to drive traffic for new drugs by going straight to the people who would need treatment. The birth-control pill, Yaz, increased its sales from $262 million in 2007 to  $616 million in 2008, utilizing a few high profile (and misleading) broadcast ads.

Misleading or inaccurate information could lead consumers to make poor health decisions, or take risks that they may not fully understand.
So how does the government keep consumers safe and pharmaceutical advertisers honest? Right now the process is managed by the FDA Division of Drug Marketing, Advertising, and Communications (DDMAC). Advertisers are required to submit promotional materials to the DDMAC when they are first used or published, but not before. This means the FDA’s role is purely to “check up on” what advertisers publish, ex post facto. Ads can be circulating for months before they are critiqued and evaluated. And if an ad is found to be misleading, the FDA sends a warning letter to the offender asking them to retract the ad. That’s it most of the time.

What does the FDA check? According to their website, “advertisements cannot be false or misleading or omit material facts. They also must present a fair balance between effectiveness and risk information. FDA has consistently required that appropriate communication of effectiveness information includes any significant limitations to product use.” They require that all drug advertisements contain information as a brief summary relating to side effects, contraindications, and effectiveness. For instance, the law states that, “an advertisement may be false, lacking in fair balance, or otherwise misleading if it: “Fails to present information relating to side effects and contraindications with a prominence and readability reasonably comparable with the presentation of information relating to effectiveness of the drug, taking into account all implementing factors such as typography, layout, contrast, headlines, paragraphing, white space, and any other techniques apt to achieve emphasis.” The FDA has a very specific set of guidlines for how ads can be used in the video domain as well; including different categories of ads such as “product-claim ads”, “reminder ads” and “information seeking” ads.

The current FDA procedures for the evaluation of DTC video (broadcast) ads are wholly unwieldy. They include the submission of TEN (!!!!) copies of an annotated storyboard with each sequentially numbered frame and associated annotated references and precribing information (PI) supporting claims. Isn’t there a better way to do this?

This got me thinking about how an application like Videolyzer, that I originally built as a tool for bloggers and journalists to critique and debate online video, could be used by someone like the FDA (or the pharma companies) to streamline and digitize the evaluation and sourcing of video advertisements. This is in addition to exisiting journalism outfits, like Consumer Reports Ad Watch, which could use the tool to add back context to an overly curt video advertisement. Yaz, a birth control pill marketed by Bayer gained notoriety in late 2008 for two ads that were deemed misleading by the FDA and for which they had to run corrective ads in 2009. I’ve added the original version of one of the Yaz ads to Videolyzer for anyone interested in seeing how the tool can be used to critique a pharaceutical ad.

Videolyzer Alpha Online

Version 0.0.0.1 of Videolyzer is now online! Videolyzer is a tool designed for journalists and bloggers to be able to collaboratively assess the information quality of a video, including its transcript. Information quality involves things like credibility, validity, and comprehensivness among other things.  Videolyzer was designed to support the analysis, collection, and sharing of criticisms of online videos and is initially geared toward politics. To try it out with some of the recent presidential debate content go to http://www.videolyzer.com

Information Quality and Intentionality

My friend Kelly had some questions for me after my proposal last month and I’m finally getting around to thinking about some of them. One question that she had was about how a lot of low quality information (e.g. press releases, advertisements etc.) is not accidentally of low quality, but is rather intentionally biased to get a particular side across. Should a measure of information quality address the intentionality of the communication? Is it worse if something is misleading than mistaken?

Whether something is misleading or mistaken has to do with the intentionality of the communicator, however what one perceives in the end is still the same: lower quality information. I think it would be difficult to show that someone had the intention to mislead because that information is shared only by the creator of the information, or at best the institution. Based on just the end product there’s no way to know the intentionality. If we could tell if a communicator was intentionally misleading, we would be able to factor this in to their reputation score. However, there are some cues that can raise suspicions about intentionality such as the relationship of the communicator to advertisers, the political leanings of the communicator, and the funding source for the production of the information. But these aren’t smoking guns; just because a cell phone maker pays for a study on the dangers of cell phone use, it doesn’t necessarily mean that the results are biased. But it does give us pause to think about the intentions of the producer of the information.

So back to the original question: should information quality address intentional bias? Yes, I think it should, but since the true intentions of a communicator are hidden, we have to rely on the cues that I listed above. The more intentionally biased a source is, the more this should in turn affect their credibility rating; in fact this could be thought of as another facet of source annotation for the system that I’m building.

Misinformation on YouTube

Recently I read an article that I first found on Slashdot and later tracked down to the original from JAMA (The Journal of the American Medical Association) about a content analysis study of videos concerning vaccination on YouTube. Researchers at The University of Toronto took a sample of 153 videos on YouTube by searching for videos containing the keywords vaccination or immunization. They then coded the videos based on whether they conveyed a positive or negative message about vaccination. They looked at the specific scientific claims being make and also coded them as either substantiated or unsubstantiated using the Canadian Immunization Guide as a reference. 48% of the videos were found to have a positive message, 32% were negative, and 20% were ambiguous. It was troubling that they did find that the negative videos had a higher mean user rating (i.e. 1-5 stars) as compared to the positive videos. 14.3% of the sample (22 videos) conveyed messages that were not substantiated with reference to the Canadian immunization guide.

The way this was reported in some news outlets (including the press release) was itself misleading. It was reported that 45% “of those videos” contradicted the Canadian reference guide, however, it wasn’t 45% of the total sample but rather 45% of only the negative videos that contradicted the reference guide. So the percentage of the entire sample (22 / 153) is 14.3%, considerably less alarming or big than was reported. So misinformation is still an issue on YouTube, but the magnitude of the reported effect wasn’t stated clearly; yet another reason to go back to the original source.