Category Archives: social media

How does newspaper circulation relate to Twitter following?

I was recently looking at circulation numbers from the Audit Bureau of Circulation for the top twenty-five newspapers in the U.S. and wondered: How does circulation relate to Twitter following? So for each newspaper I found the Twitter account and recorded the number of followers (link to data). The graph below shows the ratio of Twitter followers to total circulation; you could say it’s some kind of measure of how well the newspaper has converted its circulation into a social media following.

You can clearly see national papers like the NYT and Washington Post rise above the rest, but for others like USA Today it’s surprising that with a circulation of about 1.7M, they have comparatively few — only 514k — Twitter followers. This may say something about the audience of that paper and whether that audience is online and using social media. For instance, Pew has reported stats that suggest that people over the age of 50 use Twitter at a much lower than average rate. Another possible explanation is that a lot of the USA Today circulation is vapor; I can’t remember how many times I’ve stayed at a hotel where USA Today was left for me by default, only to be left behind unread. Finally, maybe USA Today is just not leading an effective social strategy and they need to get better about reaching, and appealing to, the social media audience.

There are some metro papers like NY Post and LA Times that also have decent ratios, indicating they’re addressing a fairly broad national or regional audience with respect to their circulation. But the real winners in the social world are NYT and WashPost, and maybe WSJ to some extent. And in this game of web scale audiences, the big will only get bigger as they figure out how to transcend their own limited geographies and expand into the social landscape.

newspaper graph

Finding News Sources in Social Media

Whether it’s terrorist attacks in Mumbai, a plane crash landing on the Hudson River, or videos and reactions from a recently capsized cruise ship in Italy, social media has proven itself again and again to be a huge boon to journalists covering breaking news events. But at the same time, the prodigious amount of social media content posted around news events creates a challenge for journalists trying to find interesting and trustworthy sources in the din. A few recent efforts have looked at automatically identifying misinformation on Twitter, or automatically assessing credibility, though pure automation carries the risk of cutting human decision makers completely out of the loop. There aren’t many general purpose (or accessible) solutions out there for this problem either; services like Klout help identify topical authorities, and Storify and Storyful help in assembling social media content, but don’t offer additional cues for assessing credibility or trustworthiness.

Some research I’ve been doing (with collaborators at Microsoft and Rutgers) has been looking into this problem of developing cues and filters to enable journalists to better tap into social media. In the rest of this post I’ll to preview this forthcoming research, but for all the details you’ll want to see the CHI paper appearing in May and the CSCW paper appearing next month.

With my collaborators I built an application called SRSR (standing for “Seriously Rapid Source Review”) which incorporates a number of advanced aggregations, computations, and cues that we thought would be helpful for journalists to find and assess sources in Twitter around breaking news events. And we didn’t just build the system, we also evaluated it on two breaking news scenarios with seven super-star social media editors at leading local, national, and international news outlets.

The features we built into SRSR were informed by talking with many journalists and include facilities to filter and find eyewitnesses and archetypical user-types, as well as to characterize sources according to their implicit location, network, and past content. The SRSR interface allows the user to quickly scan through potential sources and get a feeling for whether they’re more or less credible and if they might make good sources for a story. Here’s a snapshot showing some content we collected and processed around the Tottenham riots.

Automatically Identifying Eyewitnesses
A core feature we built into SRSR was the ability to filter sources based on whether or not they were likely to be eyewitnesses. To determine if someone was an eyewitness we built an automatic classifier that looks at the text content shared by a user and compares it to a dictionary of over 700 key terms relating to perception, seeing, hearing, and feeling – the kind of language you would expect from eyewitnesses. If a source uses one of the key terms then we label them as a likely eyewitness. Even using this relatively simple classifier we got fairly accurate results: precision was 0.89 and recall was 0.32. This means that if a source uses one of these words it’s highly likely they are really an eyewitness to the event, but that there were also a number of eyewitnesses who didn’t use any of these key words (thus the lower recall score). Being able to rapidly find eyewitnesses with 1st hand information was one of the most liked features in our evaluation. In the future there’s lot’s we want to do to make the eyewitness classifier even more accurate.

Automatically Identifying User Archetypes
Since different types of users on Twitter may produce different kinds of information we also sought to segment users according to some sensible archetypes: journalists/bloggers, organizations, and “ordinary” people. For instance, around a natural hazard news event, organizations might share information about marshaling public resources or have links to humanitarian efforts, whereas “ordinary” people are more likely to have more eyewitness information. We thought it could be helpful to journalists to be able to rapidly classify sources according to these information archetypes and so we built an automatic classifier for these categories. All of the details are in the CSCW paper, but we basically got quite good accuracy with the classifier across these three categories: 90-95%. Feedback in our evaluation indicated that rapidly identifying organizations and journalists was quite helpful.

Visually Cueing Location, Network, Entities
We also developed visual cues that were designed to help journalists assess the potential verity and credibility of a source based on their profile. In addition to showing the location of the source, we normalized and aggregated locations within a sources’s network. In particular we looked at the “friends” of a source (i.e. people that I follow and that follow me back) and show the top three most frequent locations in that network. This gives a sense of where this source knows people and has their social network. So even if I don’t live in London, if I know 50 people there it suggests I have a stake in that location or may have friends or other connections to that area that make me knowledgable about it. Participants in our evaluation really liked this cue as it gives a sense of implicit or social location. 

We also show a small sketch of the network of a source indicating who has shared relevant event content and is also following the source. This gives a sense of whether many people talking about the news event are related to the source. Journalists in our evaluation indicated that this was a nice credibility cue. For instance, if the Red Cross is following a source that’s a nice positive indicator.

Finally, we aggregated the top five most frequent entities (i.e. references to corporations, people, or places) that a source mentioned in their Twitter history (we were able to capture about 1000 historical messages for each person). The idea was that this could be useful to show what a source talks about, but in reality our participants didn’t find this feature that useful for the breaking news scenarios they were presented with. Perhaps in other scenarios it could still be useful?

What’s Next
While SRSR is a nice step forward there’s still plenty to do. For one, our prototype was not built for real-time events and was tested with pre-collected and processed data due to limitations of the Twitter API (hey Twitter, give me a call!!). And there’s plenty more to think about in terms of enhancing the eyewitness classifier, thinking about different ways to use network information to spider out in search of sources, and to experiment with how such a tool can be used to cover different kinds of events.

Again, for all the gory details on how these features were built and tested you can read our research papers. Here are the full references:

  • N. Diakopoulos, M. De Choudhury, M. Naaman. Finding and Assesing Social Media Information Sources in the Context of Journalism. Conference on Human Factors in Computing Systems (CHI). May, 2012. [PDF]
  • M. De Choudhury, N. Diakopoulos, M. Naaman. Unfolding the Event Landscape on Twitter: Classification and Exploration of User Categories. Proc. Conference on Computer Supported Cooperative Work (CSCW). February, 2012. [PDF]


Tweaking Your Credibility on Twitter

You want to be credible on social media, right? Well, a paper to be published at the Conference on Computer Supported Cooperative Work (CSCW) in early 2012 from researchers at Microsoft and Carnegie Mellon suggests at least a few actionable methods to help you do so. The basic motivation for the research is that when people see your tweet via a search (rather than following you) they have less cues to assess credibility. With a better understanding of what factors influence tweet credibility, new search interfaces can be designed to highlight the most relevant credibility cues (now you see why Microsoft is interested).

First off, five people were interviewed by the researchers to collect a range of issues that might be relevant to credibility perception. They came up with a list of 26 possible credibility cues and then ran a survey with 256 respondents in which they asked how much each feature impacted credibility perception. You can see the paper for the full results, but, for instance, things like keeping your tweets on a similar topic, using a personal photo, having a username related to the topic, having a location near a topic, having a bio that suggests relavent topical expertise, and frequent tweeting were all perceived by participants to positively impact credibility to some extent. Things like using non-standard grammar and punctuation, using the default user image were seen to detract from credibility.

Based on their first survey, the researchers then focused on three specific credibility cues for a follow-on study: (1) topic of tweets (politics, science, or entertainment), (2) user name style (first_last, internet – “tenacious27”, and topical – “AllPolitics”), and finally (3) user image (male / female photo, topical icon, generic icon, and default). For the study, each participant (there were 266) saw some combination of the above cues for a tweet, and rated both tweet credibility and author credibility. Unsurprisingly tweets about the science topic were rated as more credible than those on politics or entertainment. The most surprising result to me was that topically relevant user names were more credible than traditional names (or internet style names, though that’s not surprising). In a final follow-up experiment the researchers found that the user image doesn’t impact credibility perceptions, except for when the image is the default image in which case it significantly (in the statistical sense) lowers perceptions of tweet credibility.

So here are the main actionable take-aways:

  • Don’t use non standard grammar and punctuation (no “lol speak”)
  • Don’t use the default image.
  • Tweet about topics like science, which seem to carry an aura of credibility.
  • Find a user name that is topically aligned with those you want to reach.
That last point of finding a topically aligned user name might be an excellent strategy for large news organizations to build a more credible presence across a range of topics. For instance, right now the NY Times has a mix of accounts that have topical user names, as well as reporters using their real names. In addition to each reporter having their own “real name” account, individual tweets of theirs that were topically relevant could be routed to the appropriate topically named account. So for instance, let’s say Andy Revkin tweets something about the environment. That tweet should also show up via the Environment account, since the tweet may be perceived as having higher credibility from a topically-related user name. For people who search and find that tweet, of course if they know who Andy Revkin is, then they’ll find his tweet quite credible since he’s known for having that topical expertise. But for someone else who doesn’t know who Andy Revkin is, the results of the above study suggest that that person would find the same content more credible coming from the topically related Environment account. Maybe the Times or others are already doing this. But if not, it seems like there’s an opportunity to systematically increase credibility by adopting such an approach.

Google+ and Commenting

Twitter isn’t built for conversation, the interface just doesn’t support it – snippets of 140 characters largely floating in a groundless ether of chatter. But Google+ does (to some extent) and I’ve recently begun pondering what this means for the future of commenting online, especially around news media where I’ve done research.

One difference I see moving forward is a transition away from commenting being dictated primarily by the content, to a world where online comment threads are heavily influenced by both the content and the person sharing the content. How does the same content posted by different people lead to different conversations evolving around that content? If a conservative blogger and a liberal blogger share the same link to a news article on Google+, how do their circles react differently to that article and how does that affect the conversation? And if we aggregate these conversations back together somehow does this lead to a more interesting, engaging, or insightful experience for users? How can online publishers harness this as an opportunity?

On Google+ people post in all kinds of different ways: status updates, entire blog posts (e.g. Guy Kawasaki), or just sharing news and rich media links. Here I’ll focus on commenting around links to media since that’s most relevant to online publishers. The diversion of commenting attention and activity to platforms other than the publisher’s (e.g. Google+ or Facebook) could be seen as a threat, but it could also be an opportunity. Platform APIs can harvest this activity and aggregate it back to the publisher’s site. The opportunity is in harnessing the activity on social platforms to provide new, more sticky interfaces for keeping users engaged on the publisher’s content page. For the designers out there: what are novel ways of organizing and presenting online conversations that are enabled by new features on social networks like Google+?

One idea, for opinion oriented news articles, would be for a publisher to aggregate threads of Google+ comments from two or more well-known bloggers who have attracted a lot of commentary. These could be selected by the users, editors, or, eventually by algorithms which identify “interesting” Google+ threads. These algorithms could, for instance, identify threads with people from diverse backgrounds, from particular geographies, with particular relevant occupations, or with a pro/con stance. These threads would help tell the story from different conversational perspectives anchored around particular people sharing the original content. The threads could be embedded directly on the publisher’s site as a way to keep users there longer, perhaps getting them more interested in the debates that are happening out on social media.

Another idea would be to organize commentary by network distance, providing a view of the commentary that is personalized to an individual. Let’s say I share a link on Google+ and 20 people comment (generous to myself, I know), but then 2 of those people re-share it to their circles and 50 more people comment, and from those 50, 5 of them share it and 100 people comment, and so on. At each step of re-sharing that’s a bit further away from me (the originator) in the network. Other people in the network can also share the link (as originators) and it will diffuse. All of this activity can be aggregated and presented to me based on how many hops away in the network a comment falls. I may be interested in comments that are 1 hop away, and maybe 2 (friends of friends) but maybe not further than that. Network distance from the user could end up being a powerful social filter.

There’s lots to try here and while I think it’s great that new platforms for commenting are emerging, it’s time for publishers to think about how to tap into these to improve the user experience either by enabling new ways of seeing discussion or new ways to learn and socialize with others around content.