People Scopes, Platforms, and Research

I recently finished reading Pasteur’s Quadrant. The gist of the book is that the author, Donald Stokes, argues that the traditional (back to the Greeks) distinction between basic research and applied research is misguided.  Louis Pasteur didn’t make that distinction. He in fact was very much driven to solve real-world problems whilst also pursuing basic scientific understanding of the phenomena that he observed. The book does a great job of explaining the historical antecedents of the basic-applied distinction in the modern research-industrial complex, and I would highly recommend it to other researchers.

Stokes defines basic research as “experimental or theoretical work undertaken primarily to acquire new knowledge of the underlying foundation of phenomena and observable facts” whereas applied research is concerned with “the elaboration and application of the known … to convert the possible into the actual, to demonstrate the feasibility of scientific or engineering development, to explore alternative routes and methods for achieving practical ends.” But he argues that this one dimensional dichotomy is too simple and that it should be expanded to a two dimensional typology with consideration of use on one axis, and fundamental understanding on the other. The quadrant of this typology that is concerned with fundamental understanding AND considerations of use is termed Pasteur’s quadrant, or alternately “use-inspired basic research.”

I reproduced a diagram from the book that illustrates the typology:

Use-inspired basic research can advance both fundamental knowledge as well as technology. Which is a good thing because new (or better) technology enables new scientific questions to be asked. And the answers to those scientific questions can often lead to better technology designs. The scanning electron microscope (SEM) is a good example.

I think it’s likely that many interdisciplinary fields thrive at this intersection of applied and basic research and Human Computer Interaction (HCI) is no exception. A lot of HCI research seeks to harness fundamental knowledge for the design of interactive systems but at the same time use new technologies and interfaces to ask fundamental questions about people and interfaces (though these two phases do not always occur simultaneously, nor must they). Basic findings can trickle back to core disciplines (e.g. psychology, sociology), and other findings from other core disciplines can inform the designs and the engineering that goes into building the next generation of interactive systems.

Take a simple new technology that has had a huge impact on computational social science research: Twitter. Twitter is the computational social science “scope” that lets researchers ask all kinds of interesting questions about social psychology. Refining such knowledge could lead to a newer social scope (Twitter 2.0?) that is even better. Another example is Digg.com, which a few years ago was a technology that helped advance our understanding of information novelty and decay.

Real people-scopes working in naturalistic settings are essential for basic research as well as for driving technology forward. Academia (not just in HCI, but in other interdisciplinary social sciences) needs to get more strategic about building people-scopes, basically platforms that enable new human-centered questions to be asked, at scale. Unfortunately academia is not traditionally good at platforms. Right now I can only think of a few academic projects that have done this successfully: Movie Lens at University of Minnesota, Scratch at MIT, and maybe IBM has also had some semi-successful ones.

There are likely a number of reasons why academia is not that good at platforms: (1) grad students may not be around long enough to grow and maintain the system, (2) the risk of failure is immense and too high for a pre-tenure faculty to bear, (3) there are not enough sustained resources to maintain the systems, and (4) there are little to no marketing resources to support the acquisition of users. So there are incentive as well as resource issues here.

It may be that start-ups are simply a better source of new social platforms, since the market can quickly winnow out the unsuccessful ones, and the risk is externalized. But I think it may also warrant thinking about how funding agencies like the NSF might better support (e.g. through sustained resources and incentives) the construction of the next generation of people-scopes.

What a News Consumer Wants

What exactly is it that drives people to consume news information? If we can answer that, I would argue, then we open a new space of possibility for creating new media products, and for optimizing existing ones. As Google’s first commandment states: “Focus on the user and all else will follow.” I adopt this point of view here and will consider other perspectives (e.g. business or content producers) in future posts. In this post I really want to get at the underlying needs, motivations, or habits that drive news consumption.

First, I think it’s important to draw a distinction between the “How” of news consumption and the “Why” of news consumption. How news is consumed is largely attributable to the medium and technology of presentation (e.g. paper, radio, TV, internet). The context and form-factor of the technology also matters: the way that people consume news across different devices has been shown to vary over the course of the day, and consumption of news on tablets exhibits different patterns than consumption on other devices. Certainly online social networks such as Twitter and Facebook have changed how people are exposed to and consume news. These are all technologies that facilitate news consumption, and bias it in their own ways as their unique affordances differentially enable, place constraints on, and influence behavior. The why of news consumption is more fundamental though, since understanding the underlying needs and motivations for consuming news can drive new mechanisms for the how of consumption. Going back to a user-centered design philosophy, ideally, the how amplifies the why, and the why informs the how.

Of course, why people consume news or media is not invariant across people or contexts. So there’s not bound to be a single user model that describes all people at once. For starters, demographic factors such as age and gender have been linked to different patterns of consumption (e.g. younger people tend to consume news more for the sake of escapism or passing time, women tend to be less interested in news on science and technology). This necessitates thinking about information niches and that needs and motives may vary over time and context. For instance, social context (e.g. co-viewing) can influence people to watch television news for longer. Individual differences also exist between people: personality traits such as extraversion and openness have been linked to both interest in politics and public affairs, as well as exposure to such related news. Considering all of the moderating factors that influence why someone might consume news (i.e. demographics, context, personality, …) how could products be designed to appeal to any of these niches? What does a news product for introverts look like? How should it work differently? 

Since the 1940’s communications and journalism scholars have been developing a theoretical framework that came to be known as Uses and Gratifications (U&G), which attempts to explain why people seek out and consume media. What are the gratifications that people receive from various kinds of media or types of content which help to satisfy their underlying social and psychological needs? Some of the earliest studies looked at why people consumed radio news, and some of the most recent look at internet technologies (e.g. I have looked at news commenting through this lens). U&G theory attempts to explain how/why people select their media, as well as how concentrated the attention is that they allocate (e.g. casually attending to a report for entertainment or to pass time is different than goal-oriented information seeking). Some limitations of the theory are (1) that it assumes an active user that is making selection decisions (though sometimes these calcify into habits), and (2) that the typologies of needs and motivations are built on self-reported information, instead of observational data. This second limitation is perhaps quite important, as research has shown that people over-report their interest in international news by a factor of 3 as compared to their actual news browsing behavior. So, just a quick caveat that, ideally, user needs and motivations should be triangulated and validated based on observations of behavior in addition to self-reports.

U&G proffers a typology of gratifications which help explain why people consume news. Those listed below are taken from Miller and Ruggiero and include:

  • Informational/Surveillance: finding out about relevant events and conditions in immediate surrounding, society, and the world; seeking advice on practical matters, or opinion and decision choices; satisfying curiosity and general interest; learning, self-education
  • Personal Identity: finding reinforcement for personal values; finding models of behavior; identifying with media actors; gaining insight into one’s self
  • Integration and Social Interaction: insight into circumstances of others including social empathy; identifying with others and gaining a sense of belonging; finding a basis for conversation and social interaction; enabling connection with family, friends, society
  • Entertainment/Diversion: escaping, relaxing, cultural or aesthetic enjoyment, filling time, emotional release, sexual arousal

You might ask yourself which news products address any of these motives better or worse? For instance, getting news on Facebook makes integration and social interaction motives very salient and easy for the user; watching Jon Stewart ties together entertainment and news effectively.

But still there is the underlying question of what are the driving psychological needs that lead to these categories of gratifications being sought through the media. For this we can turn to a theory of motivation developed over the last 40 years called Social Determination Theory; here’s a nice book on the subject. The theory postulates that there are three main drivers of intrinsic motivation: (1) autonomy, (2) competence, and (3) social-relatedness. Autonomy is about providing people with choices – the more choices people have the more in control they feel. Competence is about helping people to see the relationship between their behavior and some desired outcome; feeling competent is about taking on a challenge and meeting it. How could news products better help people feel autonomous or competent? Those products would be hits. The last driver is social-relatedness which is about people feeling connected to other people; social networks are already doing a pretty good job of satisfying that underlying psychological need.

Beyond psychological needs though, there may even be a biological driver for news consumption. In 1996 Pamela Shoemaker argued in a Journal of Communication paper that the human desire to surveil is evolutionarily adapted to help detect deviances or threats in the environment; humans that could surveil better were more likely to survive because they could avoid threats and thus reproduce. However, this hypothesis still needs to be tested empirically to see if people attend more to news that is more deviant (though it does seem plausible). What has been tested empirically, via a big-data analysis by information scientists Fang Wu and Bernardo Huberman, is how human attention orients to novel information and that that attention naturally decays over time according to a mathematical function. Indeed, for the digg.com site they found that the half-life for an item was, on average, 69 minutes, which suggests a natural time-scale (though site dependent) at which human attention fades.

There is a wide palette of options for thinking about new ways of engaging people in news information: context, demographics, personality, uses & gratifications, psychological needs, and biological drivers for novel information. There are likely many new (or existing) news products that can leverage this typology to personalize and make sure people are getting what they came for out of their media experience. And, to make the job even easier, research has also shown that people enjoy incidental exposure to news information. So, even if you initially motivate people to engage the media in one way (e.g. social relatedness), they will likely still enjoy incidental exposure to other news information.

Authoring Data-Driven Documents

Over the last few months I’ve been learning D3 (Data-Driven Documents), which is a really powerful data visualization library built for javascript. The InfoVis paper gets to the gritty details of how it supports data transformations, immediate evaluation of attributes, and a native SVG representation. These features can be more or less helpful depending on what kind of visualization you’re working on. For instance, transformations don’t really matter if you’re just building static graphs. But being able to inspect the SVG representation of your visualization (and edit it in the console) is really quite helpful and powerful.

But for all the power that D3 affords, is programming really how we should be (want to be?) authoring visualizations?

Here’s something that I recently made with D3. It’s a story about U.S. manufacturing productivity, employment, and automation told across a series of panels programmed using D3.

Now, of course, the exploratory data analysis, storyboarding, and research needed to tell this story were time-consuming. But after all that, using D3 to render the graphs I wanted was substantially more tedious and time-consuming than I would have liked. I think this was because (1) my knowledge of SVG is not fantastic and I’m still learning that, but more importantly (2) D3 supports very low-level operations that make high level activities for basic data storytelling time-consuming to implement. And yes, D3 does provide a number of helper modules and layouts, but these aren’t documented with clear examples using concrete data that would make it obvious how to easily utilize them. Having support for the library on jsFiddle, together with some very simple examples would go a long way towards helping noobs (like me!) ramp up.

But, really, where’s the flash-like authoring tool of data visualization? Such a tool could be used to interactively manipulate a D3 visualization and, when you’re done, output HTML + CSS + D3 code to generate your graphs (including animation, transitions, etc.). The tool would also include basic graph templates that could be populated with your data and customized. Basic storytelling functions for highlighting important aspects or comparisons of the data (e.g. through animation, color, juxtaposition, etc.), or using text to annotate and explain the data could also be supported. D3 suffers from a bit of a usability problem right now, and powerful as it is, authoring stories with visualization doesn’t need to be, nor should it be, bound up in programming.

Modeling Computing and Journalism (Part I)

Recently I’ve been thinking more about modeling the intersection of computing and journalism, and in particular thinking about ways that aspects of computing might impact or allow for innovation in journalism. It struck me that I needed a more precise definition of computing and its purview (I’ll come back to the journalism side of the equation in a later post). What, exactly, is computing? I’ll try to answer that in this post…

Definitions of computing and computer science abound online, but the most canonical comes perhaps from Peter Denning, an elder in the field of Computer Science. In a CACM article from 2005 he writes, “Computing is the systematic study of algorithmic processes that describe and transform information”. Two key words there: “algorithmic” and “information”. Computing is about information, about describing and transforming it, but also about acquiring, representing, structuring, storing, accessing, managing, processing, manipulating, communicating, and presenting it. And computing is about algorithms: their theory, feasibility, analysis, structure, expression, and implementation. The fundamental question of computing concerns what information processes can be effectively automated.

In modern CS there is a huge body of knowledge that stems from this core notion of computing. For instance, the Computer Science Curriculum defined in 2008 defines 14 different areas of knowledge (see list below). The Georgia Tech College of Computing delineates some of these areas as belonging to core computer science, and others belonging to interactive computing. Roughly, core computer science deals with the conceptual (i.e. mathematical), and operational (i.e nuts and bolts of how a modern computer works) aspects of computing. Interactive computing on the other hand mostly deals with information input, modeling, and output. There are aspects of professional practice, engineering, and design that apply in both.

Core Computer Science

  • Discrete Structures, Programming Fundamentals, Software Engineering, Algorithms and Complexity, Architecture and Organization, Operating Systems, Programming Languages, Net Centric Computing, Information Management, Computational Science

Interactive Computing

  • Human Computer Interaction, Graphics and Visual Computing, Intelligent Systems

In terms of modeling the intersection of computing and journalism it’s the interactive side of things that’s most interesting. How information is moved around inside a computer is less important for journalists to understand than the interactive capabilities of information input, modeling, and output afforded by computing.  That is, how does computing interface with the rest of the world? Of course many of the capabilities of computers studied in interactive computing rest on solid foundations of core computer science (e.g. you couldn’t get much done without an operating system to schedule processes and manage data). Core areas with particular relevance to interactive computing are technologies in networking/communications, information management, and to a lesser extent computational science. Below I list more detailed sub-areas for each of the interactive computing and related core areas.

  • Human Computer Interaction (HCI) includes sub-areas such as interaction design, user-centered design, multimedia systems, collaboration, online communities, human-robot interaction, natural interaction, tangible interaction, mobile and ubiquitous computing, wearable computing, and information visualization
  • Graphics and Visual Computing includes sub-areas such as geometric modeling, materials modeling and simulation, rendering, image synthesis, non-photorealistic rendering, volumetric rendering, animation, motion capture, scientific visualization, virtual environments, computer vision, image processing and editing, game engines, and computational photography
  • Intelligent Systems includes sub-areas such as general AI including search and planning, cognitive science, knowledge-based reasoning, agents, autonomous robotics, computational perception, machine learning, natural language processing and understanding, machine translation, speech recognition, and activity recognition
  • Net Centric Computing includes aspects of networking, web architecture, compression, and mobile computing.
  • Information Management includes aspects of database systems, information architecture, query languages, distributed data, data mining, information storage and retrieval, hypermedia, and multimedia databases.
  • Computational Science includes aspects of modeling, simulation, optimization, and parallel computing often oriented towards big data sets.

So what can we do with this detailed typology of interactive computing technology?

In a 2004 CACM article Paul Rosenbloom developed a notation for describing how computing interacts with other fields. In his typology, he articulated ways in which computing could implement, interact with, and embed with other disciplines, namely with physical, life, and social sciences. These different relationships between fields lead to different kinds of ideas for technology (e.g. an embedding relationship of computing in life sciences would be the notion of cyborgs, an interaction between computing and physical sciences would be robotics). In this spirit, later on in this blog series I’ll look more specifically at how some of the computing technologies articulated above can map to aspects of journalism practice, with an eye toward innovation in journalism by applying computing in new or under-explored ways.

Google+ and Commenting

Twitter isn’t built for conversation, the interface just doesn’t support it – snippets of 140 characters largely floating in a groundless ether of chatter. But Google+ does (to some extent) and I’ve recently begun pondering what this means for the future of commenting online, especially around news media where I’ve done research.

One difference I see moving forward is a transition away from commenting being dictated primarily by the content, to a world where online comment threads are heavily influenced by both the content and the person sharing the content. How does the same content posted by different people lead to different conversations evolving around that content? If a conservative blogger and a liberal blogger share the same link to a news article on Google+, how do their circles react differently to that article and how does that affect the conversation? And if we aggregate these conversations back together somehow does this lead to a more interesting, engaging, or insightful experience for users? How can online publishers harness this as an opportunity?

On Google+ people post in all kinds of different ways: status updates, entire blog posts (e.g. Guy Kawasaki), or just sharing news and rich media links. Here I’ll focus on commenting around links to media since that’s most relevant to online publishers. The diversion of commenting attention and activity to platforms other than the publisher’s (e.g. Google+ or Facebook) could be seen as a threat, but it could also be an opportunity. Platform APIs can harvest this activity and aggregate it back to the publisher’s site. The opportunity is in harnessing the activity on social platforms to provide new, more sticky interfaces for keeping users engaged on the publisher’s content page. For the designers out there: what are novel ways of organizing and presenting online conversations that are enabled by new features on social networks like Google+?

One idea, for opinion oriented news articles, would be for a publisher to aggregate threads of Google+ comments from two or more well-known bloggers who have attracted a lot of commentary. These could be selected by the users, editors, or, eventually by algorithms which identify “interesting” Google+ threads. These algorithms could, for instance, identify threads with people from diverse backgrounds, from particular geographies, with particular relevant occupations, or with a pro/con stance. These threads would help tell the story from different conversational perspectives anchored around particular people sharing the original content. The threads could be embedded directly on the publisher’s site as a way to keep users there longer, perhaps getting them more interested in the debates that are happening out on social media.

Another idea would be to organize commentary by network distance, providing a view of the commentary that is personalized to an individual. Let’s say I share a link on Google+ and 20 people comment (generous to myself, I know), but then 2 of those people re-share it to their circles and 50 more people comment, and from those 50, 5 of them share it and 100 people comment, and so on. At each step of re-sharing that’s a bit further away from me (the originator) in the network. Other people in the network can also share the link (as originators) and it will diffuse. All of this activity can be aggregated and presented to me based on how many hops away in the network a comment falls. I may be interested in comments that are 1 hop away, and maybe 2 (friends of friends) but maybe not further than that. Network distance from the user could end up being a powerful social filter.

There’s lots to try here and while I think it’s great that new platforms for commenting are emerging, it’s time for publishers to think about how to tap into these to improve the user experience either by enabling new ways of seeing discussion or new ways to learn and socialize with others around content.

Unpacking Visualization Rhetoric

Note: An edited version of the following also appears on the Chart.io blog. 

Visualization can be useful for both more exploratory purposes (e.g. generating analyses and insights based on data) as well as more communicative ends (e.g. helping other people understand and be persuaded or informed by the insights that you’ve uncovered). Oftentimes more general visualization techniques are used in the exploratory phase, whereas more specific, tailored, and hand-crafted techniques (like infographics) tend to be preferred for maximal persuasive potential in the communicative phase.

In the middle ground is a class of visualizations termed “narrative visualization” – often used in journalism contexts – which tend to include aspects of both exploratory and communicative visualization. This blending of techniques makes for an interesting domain of study and it’s here where Jessica Hullman and I began investigating how different rhetorical (persuasive) techniques are employed in visualization. We were particularly interested in how different rhetorical techniques can be used to affect the interpretation of a visualization – valuable knowledge for visualization designers hoping to influence and mold the interpretation of their audience. (Here we defer the sticky ethical question of whether someone should use these techniques since in general they can be used for both good and ill).

We carefully analyzed 51 narrative visualizations and constructed a taxonomy of rhetorical techniques we found being used. We observed rhetorical techniques being employed at four different editorial layers of a visualization: data, visual representation, annotations, and interactivity. Choices at any of these layers can have important implications for the ultimate interpretation of a visualization (e.g. the design of available interactivity can direct or divert attention). The five main classes of rhetoric we found being used include: information access (e.g. how data is omitted or aggregated), provenance (e.g. how data sources are explained and how uncertainty is shown), mapping (e.g. the use of visual metaphor), linguistic techniques (e.g. irony or apostrophe), and procedural rhetoric (e.g. how default views anchor interpretation).

The maxim “know thy audience” points to another dimension by which a visualization creator can influence the interpretation of a visualization. While most visualizations concentrate on the denotative level of communication, the most effective visualization communicators also make use of the connotative level of communication to unlock a whole other plane of interpretation. For instance, various cultural codes (e.g. what colors mean), or conventions (e.g. line graphs suggest you’re looking at temporal data even if you’re not) can suggest alternate or preferred interpretations.

While the full explanation of the taxonomy and use of codes and connotation for communication in visualization is beyond this blog post, you can see a more complete discussion in a pre-print of our forthcoming InfoVis paper.  At the very least though I’ll leave you with an example which illustrates some of these concepts.

Take the following recent example from the New York Times where various aspects of the visualization rhetoric framework apply.

The choice of labeling on the dimensions of the chart “reduce spending” vs. “don’t reduce spending” leaves out another option, “increase spending”. The choice of the color green for “willing to compromise” connotes a certain value judgement (i.e. “go, or move ahead”) as read from an American perspective. The way individual squares are aggregated to arrive at an overall color is unclear, leading to questions that could be clarified through better use of provenance rhetoric. Moreover, squares cannot be disaggregated or understood as individual data, making it difficult for users to interpret either the magnitude of the response or the specific data reported in any one square. While compelling, applying the visualization rhetoric framework during the design of this visualization could have suggested other ways to make the interpretation of the visualization more clear.

Ultimately visualization rhetoric is a framework that can be useful for designers hoping to maximize the communicative potential of a visualization. Exploratory visualization platforms (like Tableau or Chart.io) could also be enhanced with an awareness of visualization rhetoric, by, for instance, allowing users to make salient use of certain rhetorical techniques when the time comes to share a visualization.

Those particularly interested in this space should consider participating in an upcoming workshop I am co-organizing on “Telling Stories with Data” at InfoVis 2011 in Providence, RI in late October.

A few thoughts on Google News badges

Yesterday Google announced they were adding “badges” to the Google News reading experience.  The basic gist is that Google will give you private badges (which can also be shared – this is key) based on which topics you read most.

There’s been some comment / criticism from the gamification folks, like BunchBall, that this is an inane thing to do.  And from a user’s point of view I tend to agree. Simply awarding a badge, which may or may not be visible to my network, for reading isn’t much of a motivator.  The system doesn’t offer a sense of accomplishment to the user, or even offer a sense of what they need to do to advance or “achieve” the next badge. The only real positive aspect of this for the user is better feedback and visibility of their reading habits.

But, if we think about this from Google’s point of view – collecting validated information about reading habits is GOLD. By validated, what I mean is that if a user “shares” a reading badge, say it’s for “Mobile Industry”, they’re going on record as someone with some sense of expertise on the topic. “I’m so-and-so and I’m an expert on the mobile industry” is what the shared badge means. As Google accounts get linked to content creation via the author tag all of this can feed into the quality / credibility calculator that drives Google search. Moreover, a validated badge helps Google target ads more effectively. Better targeting = more clickthroughs = more money.

An interesting issue here is the granularity of badges in Google News. Can I get a badge in “Visual Analytics Research” or is that too specific? It would be helpful to be able to see the badge landscape of what’s available.

There’s no doubt that Google News badges are immature as a product offering. The user experience needs to get better and offer more incentives and motivation. Klout perks could be an interesting direction to take this. Ultimately though I think this is less of a UX play and more of an opportunity for Google to collect more data.

Newsgames “Interview”

This past week I was asked by a reporter via email to respond to some questions for a story on newsgames published today in the Sydney Morning Herald. Here’s the story, which was a little disappointing since it’s really just a blurb and barely mentions any of the great stuff I had sent the reporter.

In the interest of transparency, and more importantly, since I had already spent some time writing out my responses, I’m publishing them here.

Reporter: What impact do you think newsgames like Gaza Shield can have?
At their best newsgames can contribute to the discourse around complex issues in a number of ways. For instance, they can put the player in a position to make decisions that have immediate impact within the game environment, or they can help the player see how individual factors contribute to a complex system or process being simulated in the game. Games can certainly serve to raise awareness for issues and to get players thinking about those issues in different ways. The communicative uniqueness of games comes through a combination of goal-orientation and interactivity, with the message often arising out of the game as a result of interaction and play over time.

Reporter: Are they merely fun or can they have an educative, informative or propaganda purpose?
All of the above. Newsgames can certainly be educational and informative and lead to a deeper understanding of an issue, or help the player to experience the issue by both being a part of it and maintaining agency in some outcome. As with any communicative medium there is also the potential for manipulation, such as through the complexity (or liberties in simplification of a game system), the types of actions enabled, and of course the visual representation and way that elements of the game are depicted.

Reporter: Do they have potential to change the way a player thinks about a particular issue? Could they be utilised by media organisations as another way of disseminating news?
Every medium utilized by the news media has its strengths and weaknesses for telling different kinds of stories. Photos are great for telling compelling visual stories easily captured in the blink of an eye; maps are needed to tell the story of the weather; video is useful for showing unfolding events in real-time; text is good for unpacking complexity and presenting rationale arguments. Games are yet another implement on the storyteller’s tool belt, with their own unique set of strengths and weaknesses. The medium is still being explored and researched to better understand the range of storytelling capabilities that newsgames enable, as well as their potential to affect opinion modification.

Reporter: What are some of the recent examples of newsgames that have formed/changed opinions?
Not sure I have any recent examples of games where I could unequivocally say they had changed opinions.

Reporter: Is timeliness still a hurdle to newsgames being an effective way to disseminate news?
Yes. One of the largest challenges facing newsgames is the timeliness issue. But only if you constrain yourself to thinking of news as hard, breaking news. Breaking news is really only a fraction of the news content we consume now, with other softer variations being the bulk of news content produced and consumed. Softer news has different constraints but can still expose important newsworthy issues. “Slow burn” or “evergreen” issues afford game designers more time to refine their message, their simulation, and their interfaces to create a more compelling newsgame experience. At the same time, I think that as newsgame authoring literacy grows you’ll see the turn-around time on newsgames shrink, perhaps to the point where you could routinely see games published within 24-72 hours of an event. The other approach news organizations can take, for certain types of planned events, is to invest the time in newsgame development in anticipation of the event, so that it can be rapidly published once the event happens. With a large enough palette of newsgame scaffolds, news organizations could more rapidly publish adapted versions. Some of my own research has looked at addressing the timeliness issue by creating game-y data-visualizations where the data itself can be quickly swapped out for new or different data on a different issue.

Reporter: By logging the interactions players have with a game, could newsgames provide information on the public’s knowledge and opinions on issues like the Middle East conflict?
Yes. Designers construe games as complex systems composed of objects, relationships, and decisions that players must take on those objects and relationships. The implicit value of logged player interactions with games is to understand the nature of players’ decisions. While these decisions may indicate some stance, position, or opinion by the player, and in aggregate, about the public, it’s important to remember that decisions in games have little risk associated with them. I may choose to make a certain decision in a game simply to see how the game will react, not because I would make an analogous decision in my own real life. So, while such logged interactions will certainly provide information about the public’s knowledge and opinions on issues, such information will still need to be supplemented by traditional means of polling.

Reporter: Are there any issues that are not suited to newsgames? Can you think of any recent examples of tasteless/offensive newsgames? Have you received any complaints about Salubrious Nation?
Not all issues are well-received by all players as presentations in a game format. The aura of playfulness surrounding games can, at times, conflict with issues that demand sobriety. But this is largely a matter of taste and I suspect that as newsgames become more commonplace and mainstream this could be less of an issue. People will simply get more used to seeing complex or difficult issues presented interactively. In one of my own newsgames, Salubrious Nation, we did receive a few complaints about presenting public health as a guessing game. But these complaints were in the minority, and, in my opinion, outweighed by the benefits of exploration, engagement, and insight which the experience offers.

Visualization, Data, and Social Media Response

I’ve been looking into how people comment on data and visualization recently and one aspect of that has been studying the Guardian’s Datablog. The Datablog publishes stories of and about data, oftentimes including visualizations such as charts, graphs, or maps. It also has a fairly vibrant commenting community.

So I set out to gather some of my own data. I scraped 803 articles from the Datablog including all of their comments. Of this data I wanted to know if articles which contained embedded data tables or embedded visualizations produced more of a social media response. That is, do people talk more about the article if it contains data and/or visualization? The answer is yes, and the details are below.

While the number of comments could be scraped off of the Datablog site itself I turned to Mechanical Turk to crowdsource some other elements of metadata collection: (1) the number of tweets per article, (2) whether the article has an embedded data table, and (3) whether the article has an embedded visualization. I did a spot check on 3% of the results from Turk in order to assess the Turkers’ accuracy on collecting these other pieces of metadata: it was about 96% overall, which I thought was clean enough to start doing some further analysis.

So next I wanted to look at how the “has visualization” and “has table” features affect (1) tweet volume, and (2) comment volume. There are four possibilities: the article has (1) a visualization and a table, (2) a visualization and no table, (3) no visualization and a table, (4) no visualization and no table. Since both the tweet volume and comment volume are not normally distributed variables I log transformed them to get them to be normal (this is an assumption of the following statistical tests). Moreover, there were a few outliers in the data and so anything beyond 3 standard deviations from the mean of the log transformed variables was not considered.

For number of tweets per article:

  1. Articles with both a visualization and a table produced the largest response with an average of 46 tweets per article (N=212, SD=103.24);
  2. Articles with a visualization and no table produced an average of 23.6 tweets per article (N=143, SD=85.05);
  3. Articles with no visualization and a table produced an average of 13.82 tweets per article (N=213, SD=42.7);
  4. And finally articles with neither visualization nor table produced an average of 19.56 tweets per article (N=117, SD=86.19).

I ran an ANOVA with post-hoc Bonferroni tests to see if these means were significant. Articles with both a visualization and a table (case 1) have a significantly higher number of tweets than cases 3 (p < .01) and 4 (p < .05). Articles with just the visualization and no data table have a higher number of average tweets per article, but this was not statistically significant. The take away is that it seems that the combination of a visualization and a data table drives a significantly higher twitter response.

Results for number of comments per article are similar:

  1. Articles with both a visualization and a table produced the largest response with an average of 17.40 comments per article (SD=24.10);
  2. Articles with a visualization and no table produced an average of 12.58 comments per article (SD=17.08);
  3. Articles with no visualization and a table produced an average of 13.78 comments per article (SD=26.15);
  4. And finally articles with neither visualization nor table produced an average of 11.62 comments per article (SD=17.52)

Again with the ANOVA and post-hoc Bonferroni tests to assess statistically significant differences between means. This time there was only one statistically significant difference: Articles with both a visualization and a table (case 1) have a higher number of comments than articles with neither a visualization nor a table (case 4). The p value was 0.04. Again, the combination of visualization and data table drove more of an audience response in terms of commenting behavior.

The overall take-away here is that people like to talk about articles (at least in the context of the audience of the Guardian Datablog) when both data and visualization are used to tell the story. Articles which used both had more than twice the number of tweets and about 1.5 times the number of comments versus articles which had neither. If getting people talking about your reporting is your goal, use more data and visualization, which, in retrospect, I probably also should have done for this blog post.

As a final thought I should note there are potential confounds in these results. For one, articles with data in them may stay “green” for longer thus slowly accreting a larger and larger social media response. One area to look at would be the acceleration of commenting in addition to volume. Another thing that I had no control over is whether some stories are promoted more than others: if the editors at the Guardian had a bias to promote articles with both visualizations and data then this would drive the audience response numbers up on those stories too. In other words, it’s still interesting and worthwhile to consider various explanations for these results.

A Functional Roadmap for Innovation in Computational Journalism

By: Nicholas Diakopoulos, Ph.D.
School of Communication and Information, Rutgers University
Original Version January, 2010; Updated April 2011. A PDF is also available.

Overview

Journalism in all of its senses spans a spectrum of meaning ranging from social purpose (e.g. watchdogging), to professionalized practice (e.g. ethics and professional standards), to the functional processes that journalists employ. Innovation in journalism can happen within or across this hierarchy of meanings, but in this paper, in particular, I will explore the role that computing can play in the process aspects of journalism. My intent is to lay a foundation of computational thinking for journalistic processes upon which updated journalistic practices and reinvigorated journalistic purposes can be built.

From a process perspective, Computational Journalism is the application of computing and computational thinking to the activities of journalism including information gathering, organization and sensemaking, communication and presentation, and dissemination and public response to news information, all while upholding core values of journalism such as accuracy and verifiability. It is inclusive of CAR (Computer-Assisted Reporting) but distinctive in its focus on the processing capabilities (e.g. aggregating, relating, correlating, abstracting) of the computer in comparison to mundane aspects of storage or access. The field draws on technical sub-fields of computer science including information retrieval, artificial intelligence, content analysis, visualization, personalization, and recommender systems as well as aspects of social computing and information science.

While Computational Journalism is unlikely to ever replace journalists with computers it does promise a future where the goals of human journalists are greatly enabled and augmented through computing. Moreover, its pursuit may also inform developments in Computer Science, by, for example, driving research in visual analytics and visualization, time-critical information processing, trustworthy computing, and user interfaces.

In the remainder of this paper I will discuss opportunities for innovation along the lines of the process aspects of journalism identified above. My goal is to stimulate new research and applications of these processes in the context of journalism and explore the challenges and opportunities in this space.

Information Gathering

The adoption of cheap and ubiquitous devices with photo and video capability has already had a substantial impact on how stories are reported, both in the mainstream media and through citizen journalism. While sensing hardware has gotten cheaper and more pervasive, social networking systems (e.g. Facebook) and social awareness streams (e.g. Twitter) have explicitly connected the what of sensing with the who is sensing or reporting.

The process of information gathering and reporting largely hinges on finding and verifying sources of information. Some of the best (and most difficult) journalism hinges on cultivating relationships over time with a personal network of sources. What’s different about the sources that are available from social networks is that, although they are by and large public, they may not be familiar to the journalist. Finding the desired sources while characterizing the expertise and veracity of those sources represents a barrier to fully realizing the journalistic value from these networks.

There are at least four aspects of information gathering from social networks and awareness streams that can be enhanced computationally: (1) source expertise finding, (2) source characterization (e.g. historical biases), (3) cross-referencing and independence of breaking eye-witness reports, and (4) originating source of information determination. For instance, a computational process could automatically compute the sentiment (i.e. pro / con) of a source with respect to a range of topics or issues based on their history of Twitter messages. Such rankings could then be used to inform journalists about the background of a potential interviewee. Or, consider a breaking news scenario where a journalist is attempting to cross-reference messages for validity. Algorithms can be developed to estimate the independence of those sources or to trace information back to a likely originating source. These are just a few examples of the potential areas for technical innovation in the area of information gathering.

Organization and Sensemaking

With a growth in information gathering capabilities comes the difficulty of organizing and making sense of all of that information by journalists. This is a process where computers have already had a significant impact, namely though Computer Assisted Reporting (CAR). CAR tools are usually generic in the sense that they are widely applicable to different stories, though many tools are designed for specific data types such as geographic, temporal, or network.

While many CAR tools succeed in enabling journalists to organize their information there is still considerable room for improvement in the area of sensemaking. In particular, computational perception and content analysis enable computers to convert signals about the world (including everything from sensor values to Twitter messages) into semantically and contextually laden symbols (e.g. names of people or places) or aggregate and derivative values (e.g. the sentiment or emotion of a message, the novelty or unusualness of a message with respect to an event).

Together with interactive and visual ways of presenting these computed, “semantic” facets of information there is a huge potential space for innovation in journalism tools. Some of this innovation is happening in other domains that draw on a similar process of sensemaking, such as intelligence analysis. These tools can be evaluated to better understand how they do or do not work in the context of journalism, and, in general, computational tools developed to enable sensemaking will need rigorous attention to the evaluation of their utility in real situations. Finally, sensemaking tools not only have potential for helping journalists but also for helping “readers” make sense of growing online repositories of newsworthy content and data.

Communication and Presentation

Once a story is organized and been made sense of the next process entails communicating and presenting it in a relevant and interesting way. And while I won’t argue that every story demands it, there will be some stories that benefit from computationally infused presentations of content. A journalist might use computation in such a story by making models or data interactive in a way that informs the user moreso than reading a static story.

User interfaces need to innovate more generic paradigms to compellingly communicate complex stories via models, data, simulation, and games. For instance, recent research info playable data graphics has looked into how to add game elements such as goals, scores, and advancement into how users interact with online visualized data. Other types of newsgames explore editorial simulation or decision making processes. One thing to consider as we invent these new experiences is how journalistic norms and values play out in interactive media. There are certain notions of interactive rhetoric and literacy that need to be taken into account when training computational journalists.

As governmental data becomes emancipated from closed databases (as is the current executive order in the U.S.) the opportunities for telling stories through models, data, simulation, and games will only grow. There is a range of potential new (and not yet  invented) storytelling forms that combines both elements of interactivity and computing with games, data, and news content. This will be an area ripe for alternative methods of communicating complex information in engaging and interactive formats.

Dissemination and Public Response

From a business perspective, one of the most disruptive shifts in journalism has been the process of digitization and dissemination of content online. This transition took content that was once constrained by a fixed medium and brought the variable costs of publishing space close to zero. The implication of this shift is that there is much more content out there and, practically speaking, many more ways to compete for attention for content. With unlimited space come the issues of information overload and scale.

Computation can improve the process of dissemination by addressing information overload and scale issues through, for instance, personalization and content adaptation systems as well as recommender systems. Many of the methods developed will also be applicable to monetization strategies since the fundamental scale issue revolves around matching a paucity of attention with the right content in order to drive higher advertising revenue.

Another implication of unlimited publishing space is that instead of being constrained to a narrow “letters to the editor” page, public response can instead expand to whatever the community needs dictate. In managing the process of interaction with the public response, journalists are encountering this scale problem in terms of interacting with and moderating users’ content in online commenting systems.

In particular there is a lot that computation can offer to improve online commenting systems, both from the perspective of a journalist dealing with moderation as well as for users of the commenting system. Content analysis, such as natural language processing, computational linguistics, and standard information retrieval techniques can help with both the scale as well as the quality of the discourse by introducing new ways for filtering and organizing comments. For instance, content analysis could be used to rank comments by (1) relevance to the story, (2) subjectivity or objectivity, or (3) degree of politeness. This could aid the process of journalists interacting with readers as well as readers interacting with readers by making it easier to find high quality contributions.

Looking Ahead

Technology is rapidly changing the landscape of how news information is gathered, made sense of, communicated, and disseminated. To pave the way to the future, journalism schools need to train more computationally literate journalists who develop a deep understanding of notions of abstraction, modeling, parameterization, aggregation, scaleability, and programming. And while industry grapples with the culture clash between engineers and journalists as well as the classic innovator’s dilemma, there will be plenty of opportunities for the new computational journalists to reinvent the way news information is gathered, organized, presented, and disseminated.