Category Archives: commenting

Comment Readers Want Relevance!

A couple years ago now I wrote a paper about the quality of comments on online news stories. For the paper I surveyed a number of commenters on sacbee.com about their commenting experience on that site. One of the aspects of the experience that users complained about was that comments were often off-topic: that comments weren’t germane, or relevant, to the conversation or to the article to which they were attached. This isn’t surprising, right? If you’ve ever read into an online comment thread you know there’s a lot of irrelevant things that people are posting.

It stands to reason then that if we can make news comments more relevant then people might come away more satisfied from the online commenting experience; that they might be more apt to read and find and learn new things if the signal to noise ratio was a bit higher. The point of my post here is to show you that there’s a straightforward and easy-to-implement way to provide this relevance that coincides with both users’ and editors notions of “quality comments”.

I collected data in July via the New York Times API, including 370 articles and 76,086 comments oriented around the topic of climate change. More specifically I searched for articles containing the phrase “climate change” and then collected all articles which had comments (since not all NYT articles have comments). For each comment I also had a number of pieces of metadata, including: (1) the number of times the comment was “recommended” by someone upvoting it, and (2) whether the comment was an “editor’s selection”. Both of these ratings indicate “quality”; one from the users’ point of view and the other from the editors’. And both of these ratings in fact correlate with a simple measure of relevance as I’ll describe next.

In the dataset I collected I also had the full text of both the comments and the articles. Using some basic IR ninjitsu I then normalized the text, stop-worded it (using NLTK), and stemmed the words using the Porter stemming algorithm. This leaves us with cleaner, less noisy text to work with. I then computed relevance between each comment and its parent article by taking the dot product (cosine distance) of unigram feature vectors of tf-idf scores. For the sake of the tf-idf scores, each comment was considered a document, and only unigrams that occurred at least 10 times in the dataset were considered in the feature vectors (again to reduce noise). The outcome of this process is that for each comment-article pair I now had a score (between 0 and 1) representing similarity in the words used in the comment and those used in the article. So a score of 1 would indicate that the comment and article were using identical vocabulary whereas a score of 0 would indicate that the comment and article used no words in common.

So, what’s interesting is that this simple-to-compute metric for relevance is highly correlated to the recommendation score and editor’s selection ratings mentioned above. The following graph shows the average comment to article similarity score over each recommendation score up to 50 (red dots), and a moving average trend line (blue).

As you get into the higher recommendation scores there’s more variance because it’s averaging less values. But you can see a clear trend that as the number of recommendation ratings increases so too does the average comment to article similarity. In statistical terms, Pearson’s correlation is r=0.58 (p < .001). There’s actually a fair amount of variance around each of those means though, and the next graph shows the distribution of similarity values for each recommendation score. If you turn your head side-ways each column is a histogram of the similarity values.

We can also look at the relationship between comment to article similarity in terms of editors’ selections, certain comments that have been elevated  in the user interface by editors. The average similarity for comments that are not editors’ selections is 0.091 (N=73,723) whereas for comments that are editors’ selections the average is 0.118 (N=2363). A t-test between these distributions indicates that the difference in means is statistically significant (p < .0001). So what we learn from this is that editors’ criteria for selecting comments also correlates to the similarity in language used between the comment and article.

The implications of these findings are relatively straightforward. A simple metric of similarity (or relevance) correlates well to notions of “recommendation” and editorial selection. This metric could be surfaced in a commenting system user interface to allow users to rank comments based on how similar they are to an article, without having to wait for recommendation scores or editorial selections. In the future I’d like to look into ways to assess how predicative such metrics are in terms of recommendation scores, as well as try out different metrics of similarity, like KL divergence.

Google+ and Commenting

Twitter isn’t built for conversation, the interface just doesn’t support it – snippets of 140 characters largely floating in a groundless ether of chatter. But Google+ does (to some extent) and I’ve recently begun pondering what this means for the future of commenting online, especially around news media where I’ve done research.

One difference I see moving forward is a transition away from commenting being dictated primarily by the content, to a world where online comment threads are heavily influenced by both the content and the person sharing the content. How does the same content posted by different people lead to different conversations evolving around that content? If a conservative blogger and a liberal blogger share the same link to a news article on Google+, how do their circles react differently to that article and how does that affect the conversation? And if we aggregate these conversations back together somehow does this lead to a more interesting, engaging, or insightful experience for users? How can online publishers harness this as an opportunity?

On Google+ people post in all kinds of different ways: status updates, entire blog posts (e.g. Guy Kawasaki), or just sharing news and rich media links. Here I’ll focus on commenting around links to media since that’s most relevant to online publishers. The diversion of commenting attention and activity to platforms other than the publisher’s (e.g. Google+ or Facebook) could be seen as a threat, but it could also be an opportunity. Platform APIs can harvest this activity and aggregate it back to the publisher’s site. The opportunity is in harnessing the activity on social platforms to provide new, more sticky interfaces for keeping users engaged on the publisher’s content page. For the designers out there: what are novel ways of organizing and presenting online conversations that are enabled by new features on social networks like Google+?

One idea, for opinion oriented news articles, would be for a publisher to aggregate threads of Google+ comments from two or more well-known bloggers who have attracted a lot of commentary. These could be selected by the users, editors, or, eventually by algorithms which identify “interesting” Google+ threads. These algorithms could, for instance, identify threads with people from diverse backgrounds, from particular geographies, with particular relevant occupations, or with a pro/con stance. These threads would help tell the story from different conversational perspectives anchored around particular people sharing the original content. The threads could be embedded directly on the publisher’s site as a way to keep users there longer, perhaps getting them more interested in the debates that are happening out on social media.

Another idea would be to organize commentary by network distance, providing a view of the commentary that is personalized to an individual. Let’s say I share a link on Google+ and 20 people comment (generous to myself, I know), but then 2 of those people re-share it to their circles and 50 more people comment, and from those 50, 5 of them share it and 100 people comment, and so on. At each step of re-sharing that’s a bit further away from me (the originator) in the network. Other people in the network can also share the link (as originators) and it will diffuse. All of this activity can be aggregated and presented to me based on how many hops away in the network a comment falls. I may be interested in comments that are 1 hop away, and maybe 2 (friends of friends) but maybe not further than that. Network distance from the user could end up being a powerful social filter.

There’s lots to try here and while I think it’s great that new platforms for commenting are emerging, it’s time for publishers to think about how to tap into these to improve the user experience either by enabling new ways of seeing discussion or new ways to learn and socialize with others around content.

Visualization, Data, and Social Media Response

I’ve been looking into how people comment on data and visualization recently and one aspect of that has been studying the Guardian’s Datablog. The Datablog publishes stories of and about data, oftentimes including visualizations such as charts, graphs, or maps. It also has a fairly vibrant commenting community.

So I set out to gather some of my own data. I scraped 803 articles from the Datablog including all of their comments. Of this data I wanted to know if articles which contained embedded data tables or embedded visualizations produced more of a social media response. That is, do people talk more about the article if it contains data and/or visualization? The answer is yes, and the details are below.

While the number of comments could be scraped off of the Datablog site itself I turned to Mechanical Turk to crowdsource some other elements of metadata collection: (1) the number of tweets per article, (2) whether the article has an embedded data table, and (3) whether the article has an embedded visualization. I did a spot check on 3% of the results from Turk in order to assess the Turkers’ accuracy on collecting these other pieces of metadata: it was about 96% overall, which I thought was clean enough to start doing some further analysis.

So next I wanted to look at how the “has visualization” and “has table” features affect (1) tweet volume, and (2) comment volume. There are four possibilities: the article has (1) a visualization and a table, (2) a visualization and no table, (3) no visualization and a table, (4) no visualization and no table. Since both the tweet volume and comment volume are not normally distributed variables I log transformed them to get them to be normal (this is an assumption of the following statistical tests). Moreover, there were a few outliers in the data and so anything beyond 3 standard deviations from the mean of the log transformed variables was not considered.

For number of tweets per article:

  1. Articles with both a visualization and a table produced the largest response with an average of 46 tweets per article (N=212, SD=103.24);
  2. Articles with a visualization and no table produced an average of 23.6 tweets per article (N=143, SD=85.05);
  3. Articles with no visualization and a table produced an average of 13.82 tweets per article (N=213, SD=42.7);
  4. And finally articles with neither visualization nor table produced an average of 19.56 tweets per article (N=117, SD=86.19).

I ran an ANOVA with post-hoc Bonferroni tests to see if these means were significant. Articles with both a visualization and a table (case 1) have a significantly higher number of tweets than cases 3 (p < .01) and 4 (p < .05). Articles with just the visualization and no data table have a higher number of average tweets per article, but this was not statistically significant. The take away is that it seems that the combination of a visualization and a data table drives a significantly higher twitter response.

Results for number of comments per article are similar:

  1. Articles with both a visualization and a table produced the largest response with an average of 17.40 comments per article (SD=24.10);
  2. Articles with a visualization and no table produced an average of 12.58 comments per article (SD=17.08);
  3. Articles with no visualization and a table produced an average of 13.78 comments per article (SD=26.15);
  4. And finally articles with neither visualization nor table produced an average of 11.62 comments per article (SD=17.52)

Again with the ANOVA and post-hoc Bonferroni tests to assess statistically significant differences between means. This time there was only one statistically significant difference: Articles with both a visualization and a table (case 1) have a higher number of comments than articles with neither a visualization nor a table (case 4). The p value was 0.04. Again, the combination of visualization and data table drove more of an audience response in terms of commenting behavior.

The overall take-away here is that people like to talk about articles (at least in the context of the audience of the Guardian Datablog) when both data and visualization are used to tell the story. Articles which used both had more than twice the number of tweets and about 1.5 times the number of comments versus articles which had neither. If getting people talking about your reporting is your goal, use more data and visualization, which, in retrospect, I probably also should have done for this blog post.

As a final thought I should note there are potential confounds in these results. For one, articles with data in them may stay “green” for longer thus slowly accreting a larger and larger social media response. One area to look at would be the acceleration of commenting in addition to volume. Another thing that I had no control over is whether some stories are promoted more than others: if the editors at the Guardian had a bias to promote articles with both visualizations and data then this would drive the audience response numbers up on those stories too. In other words, it’s still interesting and worthwhile to consider various explanations for these results.

Improving Online Commenting

I’m just reading through a blog post on time.com, entitled “Commenters: Are you the Problem with Journalism?” and ironically I’m finding the comments on the post to have a lot of value. Not to say that the post is bad, the writer gets the ball rolling and sets the tone for the perspectives that arise in the comments. That said, some of the observations that commenters made, which I think ring true:

  • People are more likely to comment when they have something negative to say
  • There’s way too much junk to wade through to get much useful out of comments
  • Comments get repetitive after some cutoff (~20?)
  • User comments don’t often augment a reporter’s story on a factual basis
  • YouTube Comments are mostly worthless [Does this have to do with the community there?]
  • It’s up to the journalist to frame so that quality comments are contributed
  • Mechanisms for peer-rating should be used more widely (e.g. “report this” button)
  • Commenters have no institutional authority, nor do they attach their names to information as journalists do