Search Results for: computational

Computational Journalism and The Reporting of Algorithms

Note: A version of the following also appears on the Tow Center blog.

Software and algorithms have come to adjudicate an ever broader swath of our lives, including everything from search engine personalization and advertising systems, to teacher evaluation, banking and finance, political campaigns, and police surveillance.  But these algorithms can make mistakes. They have biases. Yet they sit in opaque black boxes, their inner workings, their inner “thoughts” hidden behind layers of complexity.

We need to get inside that black box, to understand how they may be exerting power on us, and to understand where they might be making unjust mistakes. Traditionally, investigative journalists have helped hold powerful actors in business or government accountable. But today, algorithms, driven by vast troves of data, have become the new power brokers in society. And the automated decisions of algorithms deserve every bit as much scrutiny as other powerful and influential actors.

Today the Tow Center publishes a new Tow/Knight Brief, “Algorithmic Accountability Reporting: On the Investigation of Black Boxes” to start tackling this issue. The Tow/Knight Brief presents motivating questions for why algorithms are worthy of our investigations, and develops a theory and method based on the idea of reverse engineering that can help parse how algorithms work. While reverse engineering shows promise as a method, it will also require the dedicated investigative talents of journalists interviewing algorithms’ creators as well. Algorithms are, after all, manifestations of human design.

If you’re in NYC next week, folks from the New York Times R&D lab are pushing the idea forward in their Impulse Response Workshop. And if you’re at IRE and NICAR’s 2014 CAR Conference in Baltimore on Feb 28th, I’ll be joined by Chase Davis, Frank Pasquale, and Jeremy Singer-Vine for an in-depth discussion on holding algorithms accountable. In the mean time, have a read of the paper, and let me know your thoughts, comments, and critiques.

Understanding bias in computational news media

Just a quick pointer to an article I wrote for Nieman Lab exploring some of the ways in which algorithms serve to introduce bias into news media. Different kind of writing than my typical academic-ese, but fun.

Cultivating the Landscape of Innovation in Computational Journalism

For the last several months I’ve been working on a whitepaper for the CUNY Tow-Knight Center for Entrepreneurial Journalism. It’s about cultivating more technical innovation in journalism and involves systematically mapping out what’s been done (in terms of research) as well as outlining a method for people to generate new ideas in computational journalism. I’m happy to say that the paper was published by the Tow-Knight Center today. You can get Jeff Jarvis’ take on it on the Tow-Knight blog, or for more coverage you can see the Nieman Lab write-up. Or go straight for the paper itself.

Systematic Innovation in Computational Journalism

This work takes as its premise that there may be opportunities for computational innovation in journalism that are overlooked or underexplored. What are some of the technologies that can be used to help fulfill news consumers’ needs, to advance the goals of journalism, or to enhance the production and dissemination of knowledge for the public? The paper below presents a method for systematically exploring and generating such innovation opportunities.

Part of the paper also develops a structured brainstorming activity called “Aha!” to help students and news industry professionals in thinking more about ways to combine ideas from technology, information science, user needs, and journalistic goals into useful new news products and services. We produced a printed deck of cards with different concepts that people could re-combine, and you can still get these cards from CUNY.

The Aha! Brainstorming activity was also made into an app, which is now available on the Apple App Store. The app has the advantages that you can augment the re-combinable concepts, you can audio record your brainstorming sessions, take and store photos of any notes you scribble down about your ideas, and share the whole thing via email with your colleagues.

References
N. Diakopoulos. Cultivating the Landscape of Innovation in Computational Journalism. Tow-Knight Center for Entrepreneurial Journalism. April 2012. [PDF]

A Functional Roadmap for Innovation in Computational Journalism

By: Nicholas Diakopoulos, Ph.D.
School of Communication and Information, Rutgers University
Original Version January, 2010; Updated April 2011. A PDF is also available.

Overview

Journalism in all of its senses spans a spectrum of meaning ranging from social purpose (e.g. watchdogging), to professionalized practice (e.g. ethics and professional standards), to the functional processes that journalists employ. Innovation in journalism can happen within or across this hierarchy of meanings, but in this paper, in particular, I will explore the role that computing can play in the process aspects of journalism. My intent is to lay a foundation of computational thinking for journalistic processes upon which updated journalistic practices and reinvigorated journalistic purposes can be built.

From a process perspective, Computational Journalism is the application of computing and computational thinking to the activities of journalism including information gathering, organization and sensemaking, communication and presentation, and dissemination and public response to news information, all while upholding core values of journalism such as accuracy and verifiability. It is inclusive of CAR (Computer-Assisted Reporting) but distinctive in its focus on the processing capabilities (e.g. aggregating, relating, correlating, abstracting) of the computer in comparison to mundane aspects of storage or access. The field draws on technical sub-fields of computer science including information retrieval, artificial intelligence, content analysis, visualization, personalization, and recommender systems as well as aspects of social computing and information science.

While Computational Journalism is unlikely to ever replace journalists with computers it does promise a future where the goals of human journalists are greatly enabled and augmented through computing. Moreover, its pursuit may also inform developments in Computer Science, by, for example, driving research in visual analytics and visualization, time-critical information processing, trustworthy computing, and user interfaces.

In the remainder of this paper I will discuss opportunities for innovation along the lines of the process aspects of journalism identified above. My goal is to stimulate new research and applications of these processes in the context of journalism and explore the challenges and opportunities in this space.

Information Gathering

The adoption of cheap and ubiquitous devices with photo and video capability has already had a substantial impact on how stories are reported, both in the mainstream media and through citizen journalism. While sensing hardware has gotten cheaper and more pervasive, social networking systems (e.g. Facebook) and social awareness streams (e.g. Twitter) have explicitly connected the what of sensing with the who is sensing or reporting.

The process of information gathering and reporting largely hinges on finding and verifying sources of information. Some of the best (and most difficult) journalism hinges on cultivating relationships over time with a personal network of sources. What’s different about the sources that are available from social networks is that, although they are by and large public, they may not be familiar to the journalist. Finding the desired sources while characterizing the expertise and veracity of those sources represents a barrier to fully realizing the journalistic value from these networks.

There are at least four aspects of information gathering from social networks and awareness streams that can be enhanced computationally: (1) source expertise finding, (2) source characterization (e.g. historical biases), (3) cross-referencing and independence of breaking eye-witness reports, and (4) originating source of information determination. For instance, a computational process could automatically compute the sentiment (i.e. pro / con) of a source with respect to a range of topics or issues based on their history of Twitter messages. Such rankings could then be used to inform journalists about the background of a potential interviewee. Or, consider a breaking news scenario where a journalist is attempting to cross-reference messages for validity. Algorithms can be developed to estimate the independence of those sources or to trace information back to a likely originating source. These are just a few examples of the potential areas for technical innovation in the area of information gathering.

Organization and Sensemaking

With a growth in information gathering capabilities comes the difficulty of organizing and making sense of all of that information by journalists. This is a process where computers have already had a significant impact, namely though Computer Assisted Reporting (CAR). CAR tools are usually generic in the sense that they are widely applicable to different stories, though many tools are designed for specific data types such as geographic, temporal, or network.

While many CAR tools succeed in enabling journalists to organize their information there is still considerable room for improvement in the area of sensemaking. In particular, computational perception and content analysis enable computers to convert signals about the world (including everything from sensor values to Twitter messages) into semantically and contextually laden symbols (e.g. names of people or places) or aggregate and derivative values (e.g. the sentiment or emotion of a message, the novelty or unusualness of a message with respect to an event).

Together with interactive and visual ways of presenting these computed, “semantic” facets of information there is a huge potential space for innovation in journalism tools. Some of this innovation is happening in other domains that draw on a similar process of sensemaking, such as intelligence analysis. These tools can be evaluated to better understand how they do or do not work in the context of journalism, and, in general, computational tools developed to enable sensemaking will need rigorous attention to the evaluation of their utility in real situations. Finally, sensemaking tools not only have potential for helping journalists but also for helping “readers” make sense of growing online repositories of newsworthy content and data.

Communication and Presentation

Once a story is organized and been made sense of the next process entails communicating and presenting it in a relevant and interesting way. And while I won’t argue that every story demands it, there will be some stories that benefit from computationally infused presentations of content. A journalist might use computation in such a story by making models or data interactive in a way that informs the user moreso than reading a static story.

User interfaces need to innovate more generic paradigms to compellingly communicate complex stories via models, data, simulation, and games. For instance, recent research info playable data graphics has looked into how to add game elements such as goals, scores, and advancement into how users interact with online visualized data. Other types of newsgames explore editorial simulation or decision making processes. One thing to consider as we invent these new experiences is how journalistic norms and values play out in interactive media. There are certain notions of interactive rhetoric and literacy that need to be taken into account when training computational journalists.

As governmental data becomes emancipated from closed databases (as is the current executive order in the U.S.) the opportunities for telling stories through models, data, simulation, and games will only grow. There is a range of potential new (and not yet  invented) storytelling forms that combines both elements of interactivity and computing with games, data, and news content. This will be an area ripe for alternative methods of communicating complex information in engaging and interactive formats.

Dissemination and Public Response

From a business perspective, one of the most disruptive shifts in journalism has been the process of digitization and dissemination of content online. This transition took content that was once constrained by a fixed medium and brought the variable costs of publishing space close to zero. The implication of this shift is that there is much more content out there and, practically speaking, many more ways to compete for attention for content. With unlimited space come the issues of information overload and scale.

Computation can improve the process of dissemination by addressing information overload and scale issues through, for instance, personalization and content adaptation systems as well as recommender systems. Many of the methods developed will also be applicable to monetization strategies since the fundamental scale issue revolves around matching a paucity of attention with the right content in order to drive higher advertising revenue.

Another implication of unlimited publishing space is that instead of being constrained to a narrow “letters to the editor” page, public response can instead expand to whatever the community needs dictate. In managing the process of interaction with the public response, journalists are encountering this scale problem in terms of interacting with and moderating users’ content in online commenting systems.

In particular there is a lot that computation can offer to improve online commenting systems, both from the perspective of a journalist dealing with moderation as well as for users of the commenting system. Content analysis, such as natural language processing, computational linguistics, and standard information retrieval techniques can help with both the scale as well as the quality of the discourse by introducing new ways for filtering and organizing comments. For instance, content analysis could be used to rank comments by (1) relevance to the story, (2) subjectivity or objectivity, or (3) degree of politeness. This could aid the process of journalists interacting with readers as well as readers interacting with readers by making it easier to find high quality contributions.

Looking Ahead

Technology is rapidly changing the landscape of how news information is gathered, made sense of, communicated, and disseminated. To pave the way to the future, journalism schools need to train more computationally literate journalists who develop a deep understanding of notions of abstraction, modeling, parameterization, aggregation, scaleability, and programming. And while industry grapples with the culture clash between engineers and journalists as well as the classic innovator’s dilemma, there will be plenty of opportunities for the new computational journalists to reinvent the way news information is gathered, organized, presented, and disseminated.

Content Specific Computational Journalism

Much of my prior work in the field computational journalism has focused on building tools that could either be used by journalists or readers in their respective capacities as information producers or consumers.  And the recent Duke CJ Report heavily emphasized the role of computation in informing discovery tools to help journalists uncover new stories in vast corpora of data. With the recent push toward civic data transparency by the US Government, computational accountability tools will be essential to uncovering malfeasance.

But here I’m going suggest something a bit different by setting up a spectrum of computational journalism artifacts along the dimension of content specificity. On one end you have the things I just talked about: tools that help journalists uncover stories and make sense of information. These tools are practically independent of any semantics associated with information but can be customized for different data types (e.g. geographic, time, network etc.). They’re also geared toward insight generation and designed for the kinds of work processes and tasks that journalists engage in on a daily basis.

On the other end of the spectrum there are computationally infused presentations of  stories. A computational journalist might use computation in such a story by making models or data interactive. For example one interactive graphic I worked on for SacBee.com is based on an evaporative water model together with scraped hourly Sacramento weather conditions. The goal was to paint a picture of the model and help people understand when best to water their lawns.

Another example comes from editorial simulations such as September 12th. In that interactive, an editorial model describes the relationship between terrorists and anti-terrorist bombing in the Middle East. But while the model and mechanic are of course described abstractly, the semantics of the graphics and interactions are what is essential to the presentation.

Content specific presentations rely heavily on the semantics of the information to convey meaning. Rather than being generic information tools, they intertwine computation with the story itself. Interaction, information, and visual design become essential to communicating a semantically laden model. And in comparison to generic tools, content specific CJ needs to be designed with a “reader” in mind; to disseminate insights (or opinions) with the public in mind.

There’s value to both kinds of computational journalism: tools to help uncover stories and develop models, and specific presentations to effectively communicate those models.

Computational Photo-Journalism

It’s good to see people are working on the issue of detecting alterations of news photos automatically. A student recently posted a link to an article and code which can highlight areas of an image that are suspected of being cloned.

Some of the ethical considerations that come up in manipulation of images for journalism are discussed in the paper, “MANIPULATION IN PHOTOJOURNALISM: Is it ethical? Is it corrupt?

The best technical work in the area of image forensics has been done by Hany Farid’s group at Dartmouth, which continues to publish high quality research papers on the topic.

It seems it should only be a matter of time before browsers are enhanced with the ability to run a series of algorithms which give metrics about the accuracy of an image. In fact, I will pitch this to class as a final project.

Teaching

The following are courses I teach or have taught in the past:

Algorithmic News Media
The increasing role that algorithms and automation are playing in the production of information is rapidly changing the ways in which the news media is authored, curated, disseminated, and consumed. This graduate seminar provides an overview of the latest developments in algorithmic news media on topics including journalistic data mining, automated content production, news bots, platform dissemination, and algorithmic accountability and transparency. Themes of value-sensitive design, labor, and sustainability are discussed with respect to how algorithms impact the public sphere. Here’s the Syllabus

Computational Journalism
This course explores the conceptualization and application of computational and data-driven approaches to journalism practice. Students examine how computational techniques are changing journalistic data gathering, curation, sensemaking, presentation, dissemination, and analytics of content. Here’s the Syllabus.

Storytelling with Data Visualization
This course covers the use of data visualization as a method to communicate news stories (narrative visualization) and as a way to explore and analyze data as a method to find new news stories (visual analytics) in a journalism context. Here’s the Syllabus

Diversity in the Robot Reporter Newsroom

bots

The Associated Press recently announced a big new hire: A robot reporter from Automated Insights (AI) would be employed to write up to 4,400 earnings report stories per quarter. Last year, that same automated writing software produced over 300 million stories — that’s some serious scale from a single algorithmic entity.

So what happens to media diversity in the face of massive automated content production platforms like the one Automated Insights created? Despite the fact that we’ve done pretty abysmally at incorporating a balance of minority and gender perspectives in the news media, I think we’d all like to believe that by including diverse perspectives in the reporting and editing of news we fly closer to the truth. A silver lining to the newspaper industry crash has been a profusion of smaller, more nimble media outlets, allowing for far more variability and diversity in the ideas that we’re exposed to.

Of course software has biases and although the basic anatomy of robot journalists is comparable, there are variations within and amongst different systems such as the style and tone that’s produced as well as the editorial criteria that are coded into the systems. Algorithms are the product of a range of human choices including various criteria, parameters, or training data that can also pass along inherited, systematic biases. So while a robot reporter offers the promise of scale (and of reducing costs), we need to consider an over-reliance on any one single automated system. For the sake of media diversity the one bot needs to fork itself and become 100,000.

We saw this in microcosm unfold over the last week. The @wikiparliament bot was launched in the UK to monitor edits to Wikipedia from IP addresses within parliament (a form of transparency and accountability for who was editing what). Within days it had been mimed by the @congressedits bot which was set up to monitor the U.S. Congress. What was particularly interesting about @congressedits though is that it was open sourced by creator Ed Summers. And that allowed the bot to quickly spread and be adapted for different jurisdictions like Australia, Canada, France, Sweden, Chile, Germany, and even Russia.

Tailoring a bot for different countries is just one (relatively simple) form of adaptation, but I think diversifying bots for different editorial perspectives could similarly benefit from a platform. I would propose that we need to build an open-source news bot architecture that different news and journalistic organizations could use as a scaffolding to encode their own editorial intents, newsworthiness criteria, parameters, data sets, ranking algorithms, cultures, and souls into. By creating a flexible platform as an underlying starting point, the automated media ecology could adapt and diversify faster and into new domains or applications.

Such a platform would also enable the expansion of bots oriented towards different journalistic tasks. A lot of the news and information bots you find on social media these days are parrots of various ilks: they aggregate content on a particular topical niche, like @BadBluePrep@FintechBot and @CelebNewsBot or for a geographical area like @North_GA, or they simply retweet other accounts based on some trigger words. Some of the more sophisticated bots do look at data feeds to generate novel insights, like @treasuryio or @mediagalleries, but there’s so much more that could be done if we had a flexible bot platform.

For instance we might consider building bots that act as information collectors and solicitors, moving away from pure content production to content acquisition. This isn’t so far off really. Researchers at IBM have been working on this for a couple years already and have already build a prototype system that “automatically identifies and ask[s] targeted strangers on Twitter for desired information.” The technology is oriented towards collecting accurate and up-to-date information from specific situations where crowd information may be valuable. It’s relatively easy to imagine an automated news bot being launched after a major news event to identify and solicit information, facts, or photos from people most likely nearby or involved in the event. In another related project the same group at IBM has been developing technology to identify people on Twitter that are more likely to propagate (Read: Retweet) information relating to public safety news alerts. Essentially they grease the gears of social dissemination by identifying just the right people for a given topic and at a particular time who are most likely to further share the information.

There are tons of applications for news bots just waiting for journalists to build them: factchecking, information gathering, network bridging, audience development etc. etc. Robot journalists don’t just have to be reporters. They can be editors, or even (hush) work on the business side.

What I think we don’t want to end up with is the Facebook or Google of robot reporting: “one algorithm to rule them all”. It’s great that the Associated Press is exploring the use of these technologies to scale up their content creation, but down the line when the use of writing algorithms extends far beyond earnings reports, utilizing only one platform may ultimately lead to homogenization and frustrate attempts to build a diverse media sphere. Instead the world that we need to actively create is one where there are thousands of artisanal news bots serving communities and variegated audiences, each crafted to fit a particular context and perhaps with a unique editorial intent. Having an open source platform would help enable that, and offer possibilities to plug in and explore a host of new applications for bots as well.

Algorithmic Accountability & Transparency

Software and algorithms have come to adjudicate an ever broader swath of our lives, including everything from search engine personalization and advertising systems, to teacher evaluation, banking and finance, political campaigns, and police surveillance. But these algorithms can make mistakes. They have biases. Yet they sit in opaque black boxes, their inner workings, their inner “thoughts” hidden behind layers of complexity. We need to get inside that black box, to understand how they may be exerting power on us, and to understand where they might be making unjust mistakes. This research tackles this issue and proposes a practical method based on reverse engineering and auditing that journalists can employ in the investigation of algorithms.

More recently we have built a database and web interface of potentially newsworthy algorithms used by the US federal government. The goal is to lower the bar and make it easier for journalists or other actors in civil society to get started with algorithmic accountability reporting. Check out the site here: http://algorithmtips.org/

Related Articles

  • N. Diakopoulos. The Algorithms BeatData Journalism Handbook. Eds. Liliana Bornegru and Jonathan Gray. (forthcoming, 2018) [PDF Preprint]
  • M. Koliska and N. Diakopoulos. Disclose, Decode and Demystify: An Empirical Guide to Algorithmic TransparencyThe Routledge Handbook of Developments in Digital Journalism Studies. Eds. Scott Eldridge II and Bob Franklin. October, 2018.
  • N. Diakopoulos, D. Trielli, J. Stark, S. Mussenden. I vote for – How search informs our choice of candidate. In: Digital Dominance: The Power of Google, Amazon, Facebook, and Apple. Eds. M. Moore and D. Tambini. June, 2018.
  • D. Trielli, J. Stark and N. Diakopoulos. Algorithm Tips: A Resource for Algorithmic Accountability in Government. Computation + Journalism Symposium. October, 2017. [PDF][Link]
  • N. Diakopoulos and M. Koliska. Algorithmic Transparency in the News Media. Digital Journalism. 2016. [PDF]
  • D. Trielli, S. Mussenden, J. Stark, N. Diakopoulos. Googling Politics: How the Google issue guide on candidates is biased. Slate. June, 2016. Link
  • J. Stark and N. Diakopoulos. Uber seems to offer better service in areas with more white people. That raises some tough questions. Washington Post. March, 2016. Link
  • N. Diakopoulos. Accountability in Algorithmic Decision Making. Communications of the ACM (CACM). Feb. 2016. [PDF]
  • D. Trielli, S. Mussenden, N. Diakopoulos. Why Google Search Results Favor Democrats. Slate. Dec., 2015. Link
  • N. Diakopoulos. How Uber Surge Pricing Really Works. Washington Post. April, 2015. Link
  • N. Diakopoulos. Bots on the Beat. Slate. 2014. Link
  • N. Diakopoulos. Algorithmic Accountability: Journalistic Investigation of Computational Power Structures. Digital Journalism. 2015. [PDF]
  • N. Diakopoulos. Algorithmic Accountability Reporting: On the Investigation of Black Boxes. Tow Center. February 2014. [PDF]
  • N. Diakopoulos. Rage Against the Algorithms. The Atlantic. October 2013. Link
  • N. Diakopoulos. Sex, Violence, and Autocomplete Algorithms. Slate. August 2013. Link
  • N. Diakopoulos. Algorithmic Defamation: The Case of the Shameless Autocomplete. Tow Center. August 2013. Link