Category Archives: computational journalism

Comment Readers Want Relevance!

A couple years ago now I wrote a paper about the quality of comments on online news stories. For the paper I surveyed a number of commenters on sacbee.com about their commenting experience on that site. One of the aspects of the experience that users complained about was that comments were often off-topic: that comments weren’t germane, or relevant, to the conversation or to the article to which they were attached. This isn’t surprising, right? If you’ve ever read into an online comment thread you know there’s a lot of irrelevant things that people are posting.

It stands to reason then that if we can make news comments more relevant then people might come away more satisfied from the online commenting experience; that they might be more apt to read and find and learn new things if the signal to noise ratio was a bit higher. The point of my post here is to show you that there’s a straightforward and easy-to-implement way to provide this relevance that coincides with both users’ and editors notions of “quality comments”.

I collected data in July via the New York Times API, including 370 articles and 76,086 comments oriented around the topic of climate change. More specifically I searched for articles containing the phrase “climate change” and then collected all articles which had comments (since not all NYT articles have comments). For each comment I also had a number of pieces of metadata, including: (1) the number of times the comment was “recommended” by someone upvoting it, and (2) whether the comment was an “editor’s selection”. Both of these ratings indicate “quality”; one from the users’ point of view and the other from the editors’. And both of these ratings in fact correlate with a simple measure of relevance as I’ll describe next.

In the dataset I collected I also had the full text of both the comments and the articles. Using some basic IR ninjitsu I then normalized the text, stop-worded it (using NLTK), and stemmed the words using the Porter stemming algorithm. This leaves us with cleaner, less noisy text to work with. I then computed relevance between each comment and its parent article by taking the dot product (cosine distance) of unigram feature vectors of tf-idf scores. For the sake of the tf-idf scores, each comment was considered a document, and only unigrams that occurred at least 10 times in the dataset were considered in the feature vectors (again to reduce noise). The outcome of this process is that for each comment-article pair I now had a score (between 0 and 1) representing similarity in the words used in the comment and those used in the article. So a score of 1 would indicate that the comment and article were using identical vocabulary whereas a score of 0 would indicate that the comment and article used no words in common.

So, what’s interesting is that this simple-to-compute metric for relevance is highly correlated to the recommendation score and editor’s selection ratings mentioned above. The following graph shows the average comment to article similarity score over each recommendation score up to 50 (red dots), and a moving average trend line (blue).

As you get into the higher recommendation scores there’s more variance because it’s averaging less values. But you can see a clear trend that as the number of recommendation ratings increases so too does the average comment to article similarity. In statistical terms, Pearson’s correlation is r=0.58 (p < .001). There’s actually a fair amount of variance around each of those means though, and the next graph shows the distribution of similarity values for each recommendation score. If you turn your head side-ways each column is a histogram of the similarity values.

We can also look at the relationship between comment to article similarity in terms of editors’ selections, certain comments that have been elevated  in the user interface by editors. The average similarity for comments that are not editors’ selections is 0.091 (N=73,723) whereas for comments that are editors’ selections the average is 0.118 (N=2363). A t-test between these distributions indicates that the difference in means is statistically significant (p < .0001). So what we learn from this is that editors’ criteria for selecting comments also correlates to the similarity in language used between the comment and article.

The implications of these findings are relatively straightforward. A simple metric of similarity (or relevance) correlates well to notions of “recommendation” and editorial selection. This metric could be surfaced in a commenting system user interface to allow users to rank comments based on how similar they are to an article, without having to wait for recommendation scores or editorial selections. In the future I’d like to look into ways to assess how predicative such metrics are in terms of recommendation scores, as well as try out different metrics of similarity, like KL divergence.

Fact-Checking at Scale

Note: this is cross-posted on the CUNY Tow-Knight Center for Entrepreneurial Journalism site. 

Over the last decade there’s been a substantial growth in the use of Fact-Checking to correct misinformation in the public sphere. Outlets like Factcheck.org and Politifact tirelessly research and assess the accuracy of all kinds of information and statements from politicians or think-tanks. But a casual perusal of these sites shows that there are usually only 1 or 2 fact-checks per day from any given outlet. Fact-Checking is an intensive research process that demands considerable skilled labor and careful consideration of potentially conflicting evidence. In a task that’s so labor intensive, how can we scale it so that the truth is spread far and wide?

Of late, Politifact has expanded by franchising its operations to states – essentially increasing the pool of trained professionals participating in fact-checking. It’s a good strategy, but I can think of at least a few others that would also grow the fact-checking pie: (1) sharpen the scope of what’s fact-checked so that attention is where it’s most impactful, (2) make use of volunteer, non-professional labor via crowdsourcing, and (3) automate certain aspects of the task so that professionals can work more quickly. In the rest of this post, I’ll flesh out each of these approaches in a bit more detail.

Reduce Fact-Checking Scope
“I don’t get to decide which facts are stupid … although it would certainly save me a lot of time with this essay if I were allowed to make that distinction.” argues Jim Fingal in his epic fact-check struggle with artist-writer John D’Agata in The Lifespan of a Fact. Indeed, some of the things Jim checks are really absurd: did the subject take the stairs or the elevator, did he eat “potatoes” or “french fries”; these things don’t matter to the point of that essay, nor, frankly, to me as the reader.

Fact-checkers, particularly the über-thorough kind employed by magazines, are tasked with assessing the accuracy of every claim or factoid written in an article (See the Fact Checker’s Bible for more). This includes hard facts like names, stats, geography, and physical properties as well as what sources claim via a quotation, or what the author writes from notes. Depending on the nature of the claim some of it may be subjective, opinion-based, or anecdotal. All of this checking is meant to protect the reputation of the publication and of the writers. To maintain trust with the public. But it’s a lot to check and the imbalance between content volume and critical attention will only grow.

To economize their attention fact-checkers might better focus on overall quality; who cares if they’re “potatoes” or “french fries”? In information science studies, the notion of quality can be defined as the “value or ‘fitness’ of the information to a specific purpose or use.” If quality is really what we’re after then fact-checking would be well-served and more efficacious if it focused the precious attention of fact-checkers on claims that have some utility. These are the claims that if they were false could impact the outcome of some event or an important decision. I’m not saying accuracy doesn’t matter, it does, but fact-checkers might focus more energy on information that impacts decisions. For health information this might involve spending more time researching claims that impact health-care options and choices; for finance it would involve checking information informing decisions about portfolios and investments. And for politics this involves checking information that is important for people’s voting decisions – something that the likes of Politifact already focus on.

Increased Use of Volunteer Labor
Another approach to scaling fact-checking is to incorporate more non-professionals, the crowd, in the truth-seeking endeavor. This is something often championed by social media journalists like Andy Carvin, who see truth-seeking as an open process that can involve asking for (and then vetting) information from social media participants. Mathew Ingram has written about how platforms like Twitter and Reddit can act as crowdsourced fact-checking platforms. And there have been several efforts toward systematizing this, notably the TruthSquad, which invited readers to post links to factual evidence that supports or opposes a single statement. A professional journalist would then write an in-depth report based on their own research plus whatever research the crowd contributed. I will say I’m impressed with the kind of engagement they got, though sadly it’s not being actively run anymore.

But it’s important to step back and think about what the limitations of the crowd in this (or any) context really are. Graves and Glaisyer remind us that we still don’t really know how much an audience can contribute via crowdsourced fact-checking. Recent information quality research by Arazy and Kopak gives us some clues about what dimensions of quality may be more amenable to crowd contributions. In their study they looked at how consistent ratings of various wikipedia articles were along dimensions of accuracy, completeness, clarity, and objectivity. They found that, while none of these dimensions had particularly consistent ratings, completeness and clarity were more reliable than objectivity or accuracy. This is probably because it’s easier to use a heuristic or shortcut to assess completeness, whereas rating accuracy requires specialized knowledge or research skill. So, if we’re thinking about scaling fact-checking with a pro-am model we might have the crowd focus on aspects of completeness and clarity, but leave the difficult accuracy work to the professionals.

#Winning with Automation
I’m not going to fool anyone by claiming that automation or aggregation will fully solve the fact-checking scalability problem. But there may be bits of it that can be automated, at least to a degree where it would make the life of a professional fact-checker easier or make their work go faster. An automated system could allow any page online to be quickly checked for misinformation. Violations could be flagged and highlighted, either for lack of corroboration or for controversy, or the algorithm could be run before publication so that a professional fact-checker could take a further crack at it.

Hypothetical statements, opinions and matters of taste, or statements resting on complex assumptions may be too hairy for computers to deal with. But we should be able to automatically both identify and check hard-facts and other things that are easily found in reference materials. The basic mechanic would be one of corroboration, a method often used by journalists and social scientists in truth-seeking. If we can find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is key, since political, monetary, legal, or other connections can taint or at least place contingencies on the value of corroborated information.

There have already been a handful of efforts in the computing research literature that have looked at how to do algorithmic corroboration. But there is still work to do to define adequate operationalizations so that computers can do this effectively. First of all, we need to define, identify, and extract the units that are to be corroborated. Computers need to be able to differentiate a factually stated claim from a speculative or hypothetical one, since only factual claims can really be meaningfully corroborated. In order to aggregate statements we then need to be able to match two claims together while taking into account different ways of saying similar things. This includes the challenge of context, the tiniest change in which can alter the meaning of a statement and make it difficult for a computer to assess the equivalence of statements. Then, the simplest aggregation strategy might consider the frequency of a statement as a proxy for its truth-value (the more sources that agree with statement X, the more we should believe it) but this doesn’t take into the account the credibility of the source or their other relationships, which also need to be enumerated and factored in. We might want algorithms to consider other dimensions such as the relevance and expertise of the source to the claim, the source’s originality (or lack thereof), the prominence of the claim in the source, and the source’s spatial or temporal proximity to the information. There are many challenges here!

Any automated corroboration method would rely on a corpus of information that acts as the basis for corroboration. Previous work like DisputeFinder has looked at scraping or accessing known repositories such as Politifact or Snopes to jump-start a claims database, and other work like Videolyzer has tried to leverage engaged people to provide structured annotations of claims. Others have proceeded by using the internetas a massive corpus. But there could also be an opportunity here for news organizations, who already produce and have archives of lots of credible and trustworthy text (e.g. rigorously fact-checked magazines), to provide a corroboration service based on all of the claims embedded in those texts. Could news organizations even make money by syndicating their archives like this?

There are of course other challenges to fact-checking that also need to be surmounted, such as the user-interface for presentation or how to effectively syndicate fact-checks across different media. In this essay I’ve argued that scale is one of the key challenges to fact-checking. How can we balance scope with professional, non-professional, and computerized labor to get closer to the truth that really matters?

 

The Future of Automated Story Production

Note: this is cross-posted on the CUNY Tow-Knight Center for Entrepreneurial Journalism site. 

Recently there’s been a surge of interest in automatically generating news stories. The poster child is a start-up called Narrative Science which has earned coverage by the likes of the New York Times, Wired, and numerous blogs for its ability to automatically produce actual, readable stories of things like sports games or companies’ financial reports based on nothing more than numeric data. It’s impressive stuff, but it doesn’t stop me from thinking: What’s next? In the rest of this post I’ll talk about some challenges, such as story schema and modality, data context, and text transparency, that could improve future story generation engines.

Without inside information we can’t say for sure exactly how Narrative Science (NS) works, though there are some academic systems out there that provide a suitable analogue for description. There are two main phases that have to be automated in order to produce a story this way: the analysis phase and the generative phase. In the analysis phase, numeric data is statistically analyzed for things like trends, clusters, patterns, and outliers or exceptions. The analysis phase also includes the challenging aspect of condensing or selecting the most interesting things to include in the story (see Ramesh Jain’s “Extreme Stories” for more on this).

Followed by analysis and selection comes the task of figuring out an interesting structure to order the information in the story, a schema. Narrative Science differentiates itself primarily, I think, by paying close attention to the structure of the stories it generates. Many of the precursors to NS were stuck in the mode of presenting generated text in a chronological schema, which, as we know is quite boring for most stories. Storytelling is really all about structure: providing the connections between aspects of the story, its actors and setting, using some rhetorical ordering that makes sense for and engages the reader. There are whole books written on how to effectively structure stories to explore different dramatic arcs or genres. Many of these different story structures have yet to be encoded in algorithms that generate text from data, so there’s lots of room for future story generation engines to explore diverse text styles, genres, and dramatic arcs.

It’s also important to remember that text has limitations on the structures and the schema it supports well. A textual narrative schema might draw readers in, but, depending on the data, a network schema or a temporal schema might expose different aspects of a story that aren’t apparent, easy, or engaging to represent in text. This leads us to another opportunity for advancement in media synthesis: better integration of textual schema with visualization schemas (e.g. temporal, hierarchical, network). For instance, there may be complementary stories (e.g. change over time, comparison of entities) that are more effectively conveyed through dynamic visualizations than through text. Combining these two modalities has been explored in some research but there is much work to do in thinking about how best to combine textual schema with different visual schema to effectively convey a story.

There has also been recent work looking into how data can be used to generate stories in the medium of video. This brings with it a whole slew of challenges different than text generation, such as the role of audio, and how to crop and edit existing video into a coherent presentation. So, in addition to better incorporating visualization into data-driven stories I think there are opportunities to think about automatically composing stories from such varied modalities as video, photos, 3D, games, or even data-based simulations. If you have the necessary data for it, why not include an automatically produced simulation to help communicate the story?

It may be surprising to know that text generation from data has actually been around for some time now. The earliest reference that I found goes back 26 years to a paper that describes how to automatically create written weather reports based on data. And then ten years ago, in 2002, we saw the launch of Newsblaster, a complex news summarization engine developed at Columbia University that took articles as a data source and produced new text-based summaries using articles clustered around news events. It worked all right, though starting from text as the data has its own challenges (e.g. text understanding) that you don’t run into if you’re just using numeric data. The downside of using just numeric data is that it is largely bereft of context. One way to enhance future story generation engines could be to better integrate text generated by numeric data together with text (collected from clusters of human-written articles) that provides additional context.

The last opportunity I’d like to touch on here relates to the journalistic ideal of transparency. I think we have a chance to embed this ideal into algorithms that produce news stories, which often articulate a communicative intent combined with rules or templates that help achieve that intent. It is largely feasible to link any bit of generated text back to the data that gave rise to that statement – in fact it’s already done by Narrative Science in order to debug their algorithms. But this linking of data to statement should be exposed publicly. In much the same way that journalists often label their graphics and visualizations with the source of their data, text generated from data should source each statement. Another dimension of transparency practiced by journalists is to be up-front about the journalist’s relationship to the story (e.g. if they’re reporting on a company that they’re involved with). This raises an interesting and challenging question of self-awareness for algorithms that produce stories. Take for instance this Forbes article produced by Narrative Science about New York Times Co. earnings. The article contains a section on “competitors”, but the NS algorithm isn’t smart enough or self-aware enough to know that it itself is an obvious competitor. How can algorithms be taught to be transparent about their own relationships to stories?

There are tons of exciting opportunities in the space of media synthesis. Challenges like exploring different story structures and schemas, providing and integrating context, and embedding journalistic ideals such as transparency will keep us more than busy in the years and, likely, decades to come.

Cultivating the Landscape of Innovation in Computational Journalism

For the last several months I’ve been working on a whitepaper for the CUNY Tow-Knight Center for Entrepreneurial Journalism. It’s about cultivating more technical innovation in journalism and involves systematically mapping out what’s been done (in terms of research) as well as outlining a method for people to generate new ideas in computational journalism. I’m happy to say that the paper was published by the Tow-Knight Center today. You can get Jeff Jarvis’ take on it on the Tow-Knight blog, or for more coverage you can see the Nieman Lab write-up. Or go straight for the paper itself.

Moving Towards Algorithmic Corroboration

Note: this is cross-posted on the Berkman/MIT “Truthiness in Digital Media” blog

One of the methods that truth seekers like journalists or social scientists often employ is corroboration. If we find two (or more) independent sources that reinforce each other, and that are credible, we gain confidence in the truth-value of a claim. Independence is key, since political, monetary, legal, or other connections can taint or at least place contingencies on the value of corroborated information.

How can we scale this idea to the web by teaching computers to effectively corroborate information claims online? An automated system could allow any page online to be quickly checked for misinformation. Violations could be flagged and highlighted, either for lack of corroboration or for a multi-faceted corroboration (i.e. a controversy).

There have already been a handful of efforts in the computing research literature that have looked at how to do algorithmic corroboration. But there is still work to do to define adequate operationalizations so that computers can be effective corroborators. First of all, we need to define and extract the units that are to be corroborated. Computers need to be able to differentiate a factually stated claim from a speculative or hypothetical one, since only factual claims can really be meaningfully corroborated. In order to aggregate statements we then need to be able to match two claims together while taking into account different ways of saying similar things. This includes the challenge of context, the tiniest change in which can alter the meaning of a statement and make it difficult for a computer to assess the equivalence of statements. Then, the simplest aggregation strategy might consider the frequency of a statement as a proxy for its truth-value (the more sources that agree with statement X, the more we should believe it) but this doesn’t take into the account the credibility of the source or their other relationships, which also need to be enumerated and factored in. We might want algorithms to consider other dimensions such as the relevance and expertise of the source to the claim, the source’s originality (or lack thereof), the prominence of the claim in the source, and the source’s spatial or temporal proximity to the information. There are many research challenges here!

Any automated corroboration method would rely on a corpus of information that acts as the basis for corroboration. Previous work like DisputeFinder has looked at scraping known repositories such as Politifact or Snopes to jump-start a claims database, and other work like Videolyzer has tried to leverage engaged people to provide structured annotations of claims, though it’s difficult to get enough coverage and scale through manual efforts. Others have proceeded by using the internet as a massive corpus. But there could also be an opportunity here for news organizations, who already produce and have archives of lots of credible and trustworthy text, to provide a corroboration service based on all of the claims embedded in those texts. A browser plugin could detect and highlight claims that are not corroborated by e.g. the NYT or Washington Post corpora. Could news organizations even make money off their archives like this?

It’s important not to forget that there are limits to corroboration too, both practical and philosophical. Hypothetical statements, opinions and matters of taste, or statements resting on complex assumptions may not benefit at all from a corroborative search for truth. Moreover, systemic bias can still go unnoticed, and a collective social mirage can guide us toward fonts of hollow belief when we drop our critical gaze. We’ll still need smart people around, but, I would argue, finding effective ways to automate corroboration would be a huge advance and a boon in the fight against a misinformed public.

Systematic Technical Innovation in Journalism

The idea that innovation can be an organized, systematic search for change is not new — Peter Drucker wrote about it over 25 years ago in his book Innovation and Entrepreneurship — and I’m fairly certain he wasn’t the first. Systematic innovation is about methodically surveying a landscape of potential innovation while also analyzing the potential economic or social value of innovations. For the last several months I’ve been working with the CUNY Graduate School for Journalism on developing a process to systematically explore the potential for technical innovation in journalism. My hope is that this can spur new ideas and growth in Computational Journalism. In the rest of this post I’ll describe how the process is developing and provide some initial feedback we’ve gotten on how it’s working.

One way to look at innovation is in terms of problem solving: (1) what’s the problem or what’s needed, and (2) how do you reify the solution. Sure, technical innovation is not the only kind of innovation, but here my focus of “how to make it happen” will be computing. The problems and needs that I’m focused on are further constrained by the domain, journalism, and include aspects of what news consumers need and want, what news producers (e.g. professional journalists, but also others) need and want, and how value is added to information during the production process.

My basic premise is that if we can identify and enumerate concrete concepts related to needs/wants and technical solutions, then we can systematically combine different concepts to arrive at new ideas for innovation. This is the core idea of combinatorial creativity:  mashing up concepts in novel juxtapositions often sparks new ideas. Drawing on lots of research and, when possible, theory, I developed a concept space which includes 27 computing and technology concepts (e.g. natural user interfaces, computer vision, data mining, etc.), 15 needs and goals that journalists or news consumers typically have with information / media (e.g. storytelling, sensemaking, staying informed, etc.), and 14 information processes that are used to increase the value of information (e.g. filtering, ordering, summarization, etc.). That amounts to 56 concepts across four main categories (computing and technology, news consumer needs, journalism goals, and information processes).

To make the creative combination of ideas more engaging I produced and printed concept cards using Moo, which were color-coded based on their main category. Each card has a concept and brief description; here’s what they look like:

Brainstorming could happen in a lot of different ways, but for a start I decided to have groups of three people with each person randomly picking a card, one card from computing and technology and two cards from the other main categories. Then the goal is to generate as many different ideas as possible for products or services that combine those three concepts in some time-frame (say 5 minutes). A recorder in the group keeps track of the concept cards drawn and all of the ideas generated so that they can be discussed later.

The process seems to be working. Earlier this week in Jeff Jarvis’ entrepreneurial journalism class I spent some time lecturing on the different concepts and then had students break into 5 groups of 3 to play the brainstorming “game”, which looked something like this:

The reaction was largely positive, with at least one student exclaiming that she really like the exercise, and another acknowledging that there were some good ideas coming out of having to think about (and apply) combinations of concepts that they hadn’t necessarily thought of before.

In a series of 3, 5-minute rounds of brainstorming, the five groups generated 54 ideas in total, for an average of 3.6 ideas per group per round. Of course there was some variability between groups and most groups needed a round to warm-up, but there were definitely some 5-star ideas generated. Some of the ideas were for general products or services, but some were also about how technologies could enable new kinds of stories to be told — editorial creativity. For instance, an idea for a general platform was to produce 3D virtual recreations of accident spots to help viewers get a better experience of why that spot could be dangerous. Another idea was to develop an app where citizen journalists could sign up and be automatically alerted when an incident occurs near their location. On the editorial creativity side of things, some ideas included using motion capture technology to recreate crime scenes or analyses, or to illustrate workplace injuries from repetitive stress. Not all of these things would make tons of money or generate millions of clicks, but that’s not the point — for now the point is to get people thinking in new directions.

We’re still thinking about ways to improve the process, like adding pressure, constraints, or context. And generating lots of ideas is good, but step two is to think about winnowing and how to assess feasibility and quality of ideas. Stay tuned as this continues to evolve…

Finding News Sources in Social Media

Whether it’s terrorist attacks in Mumbai, a plane crash landing on the Hudson River, or videos and reactions from a recently capsized cruise ship in Italy, social media has proven itself again and again to be a huge boon to journalists covering breaking news events. But at the same time, the prodigious amount of social media content posted around news events creates a challenge for journalists trying to find interesting and trustworthy sources in the din. A few recent efforts have looked at automatically identifying misinformation on Twitter, or automatically assessing credibility, though pure automation carries the risk of cutting human decision makers completely out of the loop. There aren’t many general purpose (or accessible) solutions out there for this problem either; services like Klout help identify topical authorities, and Storify and Storyful help in assembling social media content, but don’t offer additional cues for assessing credibility or trustworthiness.

Some research I’ve been doing (with collaborators at Microsoft and Rutgers) has been looking into this problem of developing cues and filters to enable journalists to better tap into social media. In the rest of this post I’ll to preview this forthcoming research, but for all the details you’ll want to see the CHI paper appearing in May and the CSCW paper appearing next month.

With my collaborators I built an application called SRSR (standing for “Seriously Rapid Source Review”) which incorporates a number of advanced aggregations, computations, and cues that we thought would be helpful for journalists to find and assess sources in Twitter around breaking news events. And we didn’t just build the system, we also evaluated it on two breaking news scenarios with seven super-star social media editors at leading local, national, and international news outlets.

The features we built into SRSR were informed by talking with many journalists and include facilities to filter and find eyewitnesses and archetypical user-types, as well as to characterize sources according to their implicit location, network, and past content. The SRSR interface allows the user to quickly scan through potential sources and get a feeling for whether they’re more or less credible and if they might make good sources for a story. Here’s a snapshot showing some content we collected and processed around the Tottenham riots.

Automatically Identifying Eyewitnesses
A core feature we built into SRSR was the ability to filter sources based on whether or not they were likely to be eyewitnesses. To determine if someone was an eyewitness we built an automatic classifier that looks at the text content shared by a user and compares it to a dictionary of over 700 key terms relating to perception, seeing, hearing, and feeling – the kind of language you would expect from eyewitnesses. If a source uses one of the key terms then we label them as a likely eyewitness. Even using this relatively simple classifier we got fairly accurate results: precision was 0.89 and recall was 0.32. This means that if a source uses one of these words it’s highly likely they are really an eyewitness to the event, but that there were also a number of eyewitnesses who didn’t use any of these key words (thus the lower recall score). Being able to rapidly find eyewitnesses with 1st hand information was one of the most liked features in our evaluation. In the future there’s lot’s we want to do to make the eyewitness classifier even more accurate.

Automatically Identifying User Archetypes
Since different types of users on Twitter may produce different kinds of information we also sought to segment users according to some sensible archetypes: journalists/bloggers, organizations, and “ordinary” people. For instance, around a natural hazard news event, organizations might share information about marshaling public resources or have links to humanitarian efforts, whereas “ordinary” people are more likely to have more eyewitness information. We thought it could be helpful to journalists to be able to rapidly classify sources according to these information archetypes and so we built an automatic classifier for these categories. All of the details are in the CSCW paper, but we basically got quite good accuracy with the classifier across these three categories: 90-95%. Feedback in our evaluation indicated that rapidly identifying organizations and journalists was quite helpful.

Visually Cueing Location, Network, Entities
We also developed visual cues that were designed to help journalists assess the potential verity and credibility of a source based on their profile. In addition to showing the location of the source, we normalized and aggregated locations within a sources’s network. In particular we looked at the “friends” of a source (i.e. people that I follow and that follow me back) and show the top three most frequent locations in that network. This gives a sense of where this source knows people and has their social network. So even if I don’t live in London, if I know 50 people there it suggests I have a stake in that location or may have friends or other connections to that area that make me knowledgable about it. Participants in our evaluation really liked this cue as it gives a sense of implicit or social location. 

We also show a small sketch of the network of a source indicating who has shared relevant event content and is also following the source. This gives a sense of whether many people talking about the news event are related to the source. Journalists in our evaluation indicated that this was a nice credibility cue. For instance, if the Red Cross is following a source that’s a nice positive indicator.

Finally, we aggregated the top five most frequent entities (i.e. references to corporations, people, or places) that a source mentioned in their Twitter history (we were able to capture about 1000 historical messages for each person). The idea was that this could be useful to show what a source talks about, but in reality our participants didn’t find this feature that useful for the breaking news scenarios they were presented with. Perhaps in other scenarios it could still be useful?

What’s Next
While SRSR is a nice step forward there’s still plenty to do. For one, our prototype was not built for real-time events and was tested with pre-collected and processed data due to limitations of the Twitter API (hey Twitter, give me a call!!). And there’s plenty more to think about in terms of enhancing the eyewitness classifier, thinking about different ways to use network information to spider out in search of sources, and to experiment with how such a tool can be used to cover different kinds of events.

Again, for all the gory details on how these features were built and tested you can read our research papers. Here are the full references:

  • N. Diakopoulos, M. De Choudhury, M. Naaman. Finding and Assesing Social Media Information Sources in the Context of Journalism. Conference on Human Factors in Computing Systems (CHI). May, 2012. [PDF]
  • M. De Choudhury, N. Diakopoulos, M. Naaman. Unfolding the Event Landscape on Twitter: Classification and Exploration of User Categories. Proc. Conference on Computer Supported Cooperative Work (CSCW). February, 2012. [PDF]

 

News Headlines and Retweets

How do you maximize the reach and engagement of your tweets? This is a hugely important question for companies who want to maximize the value of their content. There are even start-ups, like Social Flow, that specialize in optimizing the “engagement” of tweets by helping to time them appropriately. A growing body of research is also looking at what factors, both of the social network and of the content of tweets, impact how often tweets get retweeted. For instance, some of this research has indicated that tweets are more retweeted when they contain URLs and hashtags, when they contain negative or exciting and intense sentiments, and when the user has more followers. Clearly time is important too and different times of day or days of week can also impact the amount of attention people are paying to social media (and hence the likelihood that something will get retweeted).

But aside from the obvious thing of growing their follower base, what can content creators like news organizations do to increase the retweetability of their tweets? Most news organizations basically tweet out headlines and links to their stories. And that delicate choice of words in writing a headline has always been a bit of a skill and an art. But with lots of data now we can start being a bit more scientific by looking at what textual and linguistic features of headlines tend to be associated with higher levels of retweets. In the rest of this post I’ll present some data that starts to scratch at the surface of this.

I collected all tweets from the @nytimes twitter account between July 1st, 2011 and Sept. 30th, 2011 using the Topsy API. I wanted to analyze somewhat older tweets to make sure that retweeting had run its natural course and that I wasn’t truncating the retweeting behavior. Using data from only one news account has the advantage that it controls for the network and audience and allows me to focus purely on textual features. In all I collected 5101 tweets, including how many times each tweet was retweeted (1) using the built-in retweet button and (2) using the old syntax of “RT @username”. Of these tweets, 93.7% contained links to NYT content, 1.0% contained links to other content (e.g. yfrog, instagram, or government information), and 0.7% were retweets themselves. The remaining 4.6% of tweets in my sample had no link.

The first thing I looked at was what the average number of retweets was for the tweets in each group (links to NYT content, links to other content, and no links).

  • Average # of RTs for tweets with links to NYT content: 48.0
  • Average # of RTs for tweets with links to other content: 48.1
  • Average # of RTs for tweets with no links: 83.8

This is interesting because some of the best research out there suggests that tweets WITH links get more RTs. But I found just the opposite: tweets with NO LINKS got more RTs (1.74 times as many on average).  I read through the tweets with no links (there’s only 234) and they were mostly breaking news alerts like “Qaddafi Son Arrested…“, “Dow drops more than 400 points…“, or “Obama and Boehner Close to Major Budget Deal…“. So from the prior research we know that for any old tweet source, URLs are a signal that is correlated with RTs, but for news organizations, the most “newsy” or retweetable information comes in a brief snippet, without a link. The implication is not that news organization should stop linking their content to get more RTs, but rather that the kind of information shared without links from news organizations (the NYT in particular) is highly retweetable.

To really get into the textual analysis I wanted to look just at tweets with links back to NYT content though. So the rest of the analysis was done on the 4780 tweets with links back to NYT content. If you look at these tweets they basically take the form: <story headline> + <link>. I broke the dataset up into the top and bottom 10% of tweets (deciles) as ranked by their total number of RTs, which includes RTs using the built-in RT button as well as the old style RTs. The overall average # of RTs was 48.3, but in the top 10% of tweets it was 173 and in the bottom 10% it was 7.4. Here’s part of the distribution:


Is length of a tweet related to how often it gets retweeted? I looked at the average length of the tweets (in characters) in the top and bottom 10%.

  • Top 10%: 75.8 characters
  • Bottom 10%: 82.8 characters

This difference is statistically significant using a t-test (t=5.23, p < .0001). So tweets that are in the top decile of RTs are shorter, on average, by about 7 characters. This isn’t prescriptive, but it does suggest an interesting correlation that headline / tweet writers for news organizations might consider exploring.

I also wanted to get a feel for what words were used more frequently in either the top or bottom deciles. To do this I computed the frequency distribution of words for each dataset (i.e. how many times each unique word was used across all the tweets in that decile). Then for each word I computed a ratio indicating how frequent it was in one decile versus the other. If this ratio is above 1 then it indicates that that word is more likely to occur in one decile than the other. I’ve embedded the data at the end of this post in case you want to see the top 50 words ranked by their ratio for both the top and bottom deciles.

From scanning the word lists you can see that pronouns (e.g. “I, you, my, her, his, he” etc.) are used more frequently in tweets from the bottom decile of RTs. Tweets that were in the top decile of RTs were more likely to use words relating to crime (e.g. “police”, “dead”, “arrest”), natural hazards (“irene”, “hurricane”, “earthquake”), sports (“soccer”, “sox”), or politically contentious issues (e.g. “marriage” likely referring to the legalization of gay marriage in NY). I thought it was particularly interesting that “China” was much more frequent in highly RTed tweets. To be clear, this is just scratching the surface and I think there’s a lot more interesting research to do around this, especially relating to theories of attention and newsworthiness.

The last bit of data analysis I did was to look at whether certain parts of speech (e.g. nouns, verbs, adjectives) were used differently in the top and bottom RT deciles. More specifically I wanted to know: Are different parts of speech used more frequently in one group than the other? To do this, I used a natural language processing toolkit (NLTK) and computed the parts of speech (POS) of all of the words in the tweets. Of course this isn’t a perfect procedure and sometimes the POS tagger makes mistakes, but I consider this analysis preliminary. I calculated the Chi-Square test to see if there was a statistical difference in the frequency of nouns, adverbs, conjunctions (e.g. “and”, “but”, etc.), determiners (e.g. “a”, “some”, “the”, etc.), pronouns, and verbs used in either the top or bottom 10% of RTs. What I found is that there is a strong statistically significant difference for adverbs (p < .02), determiners (p < .001), and verbs (p < .003), and somewhat of a difference for conjunctions (p = .06). There was no difference in usage for adjectives, nouns, or pronouns. Basically what this boils down to is that, in tweets that get lots of RTs, adverbs, determiners (and conjunctions somewhat) are used substantially less, while verbs are used substantially more. Perhaps it’s the less frequent use of determiners and adverbs that (as described above) makes these tweets shorter on average. Again, this isn’t prescriptive, but there may be something here in terms of how headlines are written. More use of verbs, and less use of “empty” determiners and conjunctions in tweets is correlated with higher levels of retweeting. Could it be the case that action words (i.e. verbs) somehow spur people to retweet the headline? Pinning down the causality of this is something I’ll be working on next!

Here are the lists of words I promised. If you find anything else notable, please leave a comment!

Journalism as Information Science

The core activity of journalism basically boils down to this: knowledge production. It’s presented in various guises: stories, maps, graphics, interviews, and more recently even things like newsgames, but it all essentially entails the same basic components of information gathering, organizing, synthesizing, and publishing of new (sometimes just new to you) knowledge. To be sure, the particular flavor of knowledge is colored by the cultural milieu, ethics, and temporal constraints through which journalism extrudes information into knowledge. Journalists add value to information and news by making sense of it, making it more accessible and memorable, and putting it in context.

Many of the practices followed by journalists in the process of knowledge production can be mapped quite neatly to corresponding ideas in information science. Thankfully, information science studies knowledge production in a much more structured fashion, and in the rest of this post I’d like to surface some of that structure as a way for reflecting on what journalists do, and for thinking about how technology could enhance such processes.

Much of what journalists are engaged with on a day-to-day basis is in adding value to information. Raw data and information is harvested from the world, and as the journalist gathers it and makes sense of it, puts it in context, increases its quality, and frames it for decision making, it gets more and more valuable to the end-user. And by “value” I don’t necessarily mean monetary, but rather usefulness in meeting a user need. This point is important because it implies that the value of information is perceived and driven by user-needs in context. And the process is cyclical or recursive. The output of someone else, be it an article, tweet, or comment can be fed into the process for the next output.

Robert S. Taylor, one of the fathers of information studies at Syracuse University, wrote an entire book on value-added processes in information systems. Below I examine the processes that he described. There may be some information processes that journalists could learn to  do more effectively, with or without new tools. Taylor organized the processes into four broad categories:

  • Ease of Use: This includes information usability such as information architecture (i.e. how to order information), design (i.e. how to format and present information), and browseability. When journalists take a table of numbers and present them as a map or graph they are making that data far more accessible and usable; when they write a compelling story which incorporates those numbers it is also increasing value through usability. Physical accessibility is also important to ease of use, and there’s no doubt that the physical accessibility of information on a mobile or tablet is different than on a desktop.
  • Noise Reduction: This involves the processes of inclusion and exclusion with an understanding of relevance that may be informed by context or end-user needs. Journalists are constantly engaging as noise reducers as they assemble a story and decide what is relevant to include and what is not, and even by their very judgement of what is considered newsworthy. Summarization is another dimension of this, as is linking which provides access to other relevant information.
  • Quality: A lot of value is added to information by enhancing its quality. Quality decisions depend on quality information: garbage in, garbage out. Quality includes aspects of accuracy, comprehensiveness (i.e. completeness of coverage), currency, reliability (i.e. consistent and dependable), and validity. Journalists engage (sometimes) in factchecking to enhance accuracy, as well as corroboration of sources as a method to increase validity. Different end-user contexts and needs have different demands on quality: non-breaking news doesn’t have the same demands on currency for instance. Seeing as quality (i.e. a commitment to truth) is a central value of journalism, it stands to reason that tools built for journalism might consider new ways of enhancing quality.
  • Adaptability: The idea of adaptability is that information is most valuable when it meets specific needs of a person with a particular problem. This involves knowing what users’ information needs are. Another dimension is that of flexibility, providing a variety of ways to work with information. Oftentimes I think adaptability is addressed in journalism through nichification – that is one outlet specializes in a particular information need, like for example, Consumer Reports.

You can’t really argue that any of these processes aren’t important to the knowledge produced by journalists, and many (all?) of them are also important to others who produce knowledge. There are people out there specialized in some of these activities. For instance, my alma mater, Georgia Tech, pumps out masters degrees in Human Computer Interaction, which teaches you a whole lot about that first category above – ease of use. Journalism could benefit from more cross-functional teams with such specialists.

The question moving forward is: How can technology inform the design of new tools that enable journalists to add the above values to information? Quality seems like a likely target since it is so important in journalism. But aspects of noise reduction (summarization), and adaptability may also be well-suited to developing augmenting technologies. Moreover, newer forms of information (e.g. social media) are in need for new processes that can add value.

What a News Consumer Wants

What exactly is it that drives people to consume news information? If we can answer that, I would argue, then we open a new space of possibility for creating new media products, and for optimizing existing ones. As Google’s first commandment states: “Focus on the user and all else will follow.” I adopt this point of view here and will consider other perspectives (e.g. business or content producers) in future posts. In this post I really want to get at the underlying needs, motivations, or habits that drive news consumption.

First, I think it’s important to draw a distinction between the “How” of news consumption and the “Why” of news consumption. How news is consumed is largely attributable to the medium and technology of presentation (e.g. paper, radio, TV, internet). The context and form-factor of the technology also matters: the way that people consume news across different devices has been shown to vary over the course of the day, and consumption of news on tablets exhibits different patterns than consumption on other devices. Certainly online social networks such as Twitter and Facebook have changed how people are exposed to and consume news. These are all technologies that facilitate news consumption, and bias it in their own ways as their unique affordances differentially enable, place constraints on, and influence behavior. The why of news consumption is more fundamental though, since understanding the underlying needs and motivations for consuming news can drive new mechanisms for the how of consumption. Going back to a user-centered design philosophy, ideally, the how amplifies the why, and the why informs the how.

Of course, why people consume news or media is not invariant across people or contexts. So there’s not bound to be a single user model that describes all people at once. For starters, demographic factors such as age and gender have been linked to different patterns of consumption (e.g. younger people tend to consume news more for the sake of escapism or passing time, women tend to be less interested in news on science and technology). This necessitates thinking about information niches and that needs and motives may vary over time and context. For instance, social context (e.g. co-viewing) can influence people to watch television news for longer. Individual differences also exist between people: personality traits such as extraversion and openness have been linked to both interest in politics and public affairs, as well as exposure to such related news. Considering all of the moderating factors that influence why someone might consume news (i.e. demographics, context, personality, …) how could products be designed to appeal to any of these niches? What does a news product for introverts look like? How should it work differently? 

Since the 1940’s communications and journalism scholars have been developing a theoretical framework that came to be known as Uses and Gratifications (U&G), which attempts to explain why people seek out and consume media. What are the gratifications that people receive from various kinds of media or types of content which help to satisfy their underlying social and psychological needs? Some of the earliest studies looked at why people consumed radio news, and some of the most recent look at internet technologies (e.g. I have looked at news commenting through this lens). U&G theory attempts to explain how/why people select their media, as well as how concentrated the attention is that they allocate (e.g. casually attending to a report for entertainment or to pass time is different than goal-oriented information seeking). Some limitations of the theory are (1) that it assumes an active user that is making selection decisions (though sometimes these calcify into habits), and (2) that the typologies of needs and motivations are built on self-reported information, instead of observational data. This second limitation is perhaps quite important, as research has shown that people over-report their interest in international news by a factor of 3 as compared to their actual news browsing behavior. So, just a quick caveat that, ideally, user needs and motivations should be triangulated and validated based on observations of behavior in addition to self-reports.

U&G proffers a typology of gratifications which help explain why people consume news. Those listed below are taken from Miller and Ruggiero and include:

  • Informational/Surveillance: finding out about relevant events and conditions in immediate surrounding, society, and the world; seeking advice on practical matters, or opinion and decision choices; satisfying curiosity and general interest; learning, self-education
  • Personal Identity: finding reinforcement for personal values; finding models of behavior; identifying with media actors; gaining insight into one’s self
  • Integration and Social Interaction: insight into circumstances of others including social empathy; identifying with others and gaining a sense of belonging; finding a basis for conversation and social interaction; enabling connection with family, friends, society
  • Entertainment/Diversion: escaping, relaxing, cultural or aesthetic enjoyment, filling time, emotional release, sexual arousal

You might ask yourself which news products address any of these motives better or worse? For instance, getting news on Facebook makes integration and social interaction motives very salient and easy for the user; watching Jon Stewart ties together entertainment and news effectively.

But still there is the underlying question of what are the driving psychological needs that lead to these categories of gratifications being sought through the media. For this we can turn to a theory of motivation developed over the last 40 years called Social Determination Theory; here’s a nice book on the subject. The theory postulates that there are three main drivers of intrinsic motivation: (1) autonomy, (2) competence, and (3) social-relatedness. Autonomy is about providing people with choices – the more choices people have the more in control they feel. Competence is about helping people to see the relationship between their behavior and some desired outcome; feeling competent is about taking on a challenge and meeting it. How could news products better help people feel autonomous or competent? Those products would be hits. The last driver is social-relatedness which is about people feeling connected to other people; social networks are already doing a pretty good job of satisfying that underlying psychological need.

Beyond psychological needs though, there may even be a biological driver for news consumption. In 1996 Pamela Shoemaker argued in a Journal of Communication paper that the human desire to surveil is evolutionarily adapted to help detect deviances or threats in the environment; humans that could surveil better were more likely to survive because they could avoid threats and thus reproduce. However, this hypothesis still needs to be tested empirically to see if people attend more to news that is more deviant (though it does seem plausible). What has been tested empirically, via a big-data analysis by information scientists Fang Wu and Bernardo Huberman, is how human attention orients to novel information and that that attention naturally decays over time according to a mathematical function. Indeed, for the digg.com site they found that the half-life for an item was, on average, 69 minutes, which suggests a natural time-scale (though site dependent) at which human attention fades.

There is a wide palette of options for thinking about new ways of engaging people in news information: context, demographics, personality, uses & gratifications, psychological needs, and biological drivers for novel information. There are likely many new (or existing) news products that can leverage this typology to personalize and make sure people are getting what they came for out of their media experience. And, to make the job even easier, research has also shown that people enjoy incidental exposure to news information. So, even if you initially motivate people to engage the media in one way (e.g. social relatedness), they will likely still enjoy incidental exposure to other news information.