Twitter isn’t built for conversation, the interface just doesn’t support it – snippets of 140 characters largely floating in a groundless ether of chatter. But Google+ does (to some extent) and I’ve recently begun pondering what this means for the future of commenting online, especially around news media where I’ve done research.
One difference I see moving forward is a transition away from commenting being dictated primarily by the content, to a world where online comment threads are heavily influenced by both the content and the person sharing the content. How does the same content posted by different people lead to different conversations evolving around that content? If a conservative blogger and a liberal blogger share the same link to a news article on Google+, how do their circles react differently to that article and how does that affect the conversation? And if we aggregate these conversations back together somehow does this lead to a more interesting, engaging, or insightful experience for users? How can online publishers harness this as an opportunity?
On Google+ people post in all kinds of different ways: status updates, entire blog posts (e.g. Guy Kawasaki), or just sharing news and rich media links. Here I’ll focus on commenting around links to media since that’s most relevant to online publishers. The diversion of commenting attention and activity to platforms other than the publisher’s (e.g. Google+ or Facebook) could be seen as a threat, but it could also be an opportunity. Platform APIs can harvest this activity and aggregate it back to the publisher’s site. The opportunity is in harnessing the activity on social platforms to provide new, more sticky interfaces for keeping users engaged on the publisher’s content page. For the designers out there: what are novel ways of organizing and presenting online conversations that are enabled by new features on social networks like Google+?
One idea, for opinion oriented news articles, would be for a publisher to aggregate threads of Google+ comments from two or more well-known bloggers who have attracted a lot of commentary. These could be selected by the users, editors, or, eventually by algorithms which identify “interesting” Google+ threads. These algorithms could, for instance, identify threads with people from diverse backgrounds, from particular geographies, with particular relevant occupations, or with a pro/con stance. These threads would help tell the story from different conversational perspectives anchored around particular people sharing the original content. The threads could be embedded directly on the publisher’s site as a way to keep users there longer, perhaps getting them more interested in the debates that are happening out on social media.
Another idea would be to organize commentary by network distance, providing a view of the commentary that is personalized to an individual. Let’s say I share a link on Google+ and 20 people comment (generous to myself, I know), but then 2 of those people re-share it to their circles and 50 more people comment, and from those 50, 5 of them share it and 100 people comment, and so on. At each step of re-sharing that’s a bit further away from me (the originator) in the network. Other people in the network can also share the link (as originators) and it will diffuse. All of this activity can be aggregated and presented to me based on how many hops away in the network a comment falls. I may be interested in comments that are 1 hop away, and maybe 2 (friends of friends) but maybe not further than that. Network distance from the user could end up being a powerful social filter.
There’s lots to try here and while I think it’s great that new platforms for commenting are emerging, it’s time for publishers to think about how to tap into these to improve the user experience either by enabling new ways of seeing discussion or new ways to learn and socialize with others around content.
Home
Comment Readers Want Relevance!
A couple years ago now I wrote a paper about the quality of comments on online news stories. For the paper I surveyed a number of commenters on sacbee.com about their commenting experience on that site. One of the aspects of the experience that users complained about was that comments were often off-topic: that comments weren’t germane, or relevant, to the conversation or to the article to which they were attached. This isn’t surprising, right? If you’ve ever read into an online comment thread you know there’s a lot of irrelevant things that people are posting.
It stands to reason then that if we can make news comments more relevant then people might come away more satisfied from the online commenting experience; that they might be more apt to read and find and learn new things if the signal to noise ratio was a bit higher. The point of my post here is to show you that there’s a straightforward and easy-to-implement way to provide this relevance that coincides with both users’ and editors notions of “quality comments”.
I collected data in July via the New York Times API, including 370 articles and 76,086 comments oriented around the topic of climate change. More specifically I searched for articles containing the phrase “climate change” and then collected all articles which had comments (since not all NYT articles have comments). For each comment I also had a number of pieces of metadata, including: (1) the number of times the comment was “recommended” by someone upvoting it, and (2) whether the comment was an “editor’s selection”. Both of these ratings indicate “quality”; one from the users’ point of view and the other from the editors’. And both of these ratings in fact correlate with a simple measure of relevance as I’ll describe next.
In the dataset I collected I also had the full text of both the comments and the articles. Using some basic IR ninjitsu I then normalized the text, stop-worded it (using NLTK), and stemmed the words using the Porter stemming algorithm. This leaves us with cleaner, less noisy text to work with. I then computed relevance between each comment and its parent article by taking the dot product (cosine distance) of unigram feature vectors of tf-idf scores. For the sake of the tf-idf scores, each comment was considered a document, and only unigrams that occurred at least 10 times in the dataset were considered in the feature vectors (again to reduce noise). The outcome of this process is that for each comment-article pair I now had a score (between 0 and 1) representing similarity in the words used in the comment and those used in the article. So a score of 1 would indicate that the comment and article were using identical vocabulary whereas a score of 0 would indicate that the comment and article used no words in common.
So, what’s interesting is that this simple-to-compute metric for relevance is highly correlated to the recommendation score and editor’s selection ratings mentioned above. The following graph shows the average comment to article similarity score over each recommendation score up to 50 (red dots), and a moving average trend line (blue).
As you get into the higher recommendation scores there’s more variance because it’s averaging less values. But you can see a clear trend that as the number of recommendation ratings increases so too does the average comment to article similarity. In statistical terms, Pearson’s correlation is r=0.58 (p < .001). There’s actually a fair amount of variance around each of those means though, and the next graph shows the distribution of similarity values for each recommendation score. If you turn your head side-ways each column is a histogram of the similarity values.
We can also look at the relationship between comment to article similarity in terms of editors’ selections, certain comments that have been elevated in the user interface by editors. The average similarity for comments that are not editors’ selections is 0.091 (N=73,723) whereas for comments that are editors’ selections the average is 0.118 (N=2363). A t-test between these distributions indicates that the difference in means is statistically significant (p < .0001). So what we learn from this is that editors’ criteria for selecting comments also correlates to the similarity in language used between the comment and article.
The implications of these findings are relatively straightforward. A simple metric of similarity (or relevance) correlates well to notions of “recommendation” and editorial selection. This metric could be surfaced in a commenting system user interface to allow users to rank comments based on how similar they are to an article, without having to wait for recommendation scores or editorial selections. In the future I’d like to look into ways to assess how predicative such metrics are in terms of recommendation scores, as well as try out different metrics of similarity, like KL divergence.