Category Archives: computational journalism

What’s in a Ranking?

The web is a tangled mess of pages and links. But through the magic of the Google algorithm it becomes a nice and neatly ordered rank of “relevance” to whatever our heart desires. The network may be the architecture of the web, but the human ideology projected on that network is the rank.

Often enough we take rankings at face value; we don’t stop to think about what’s really in a rank. There is tremendous power conferred upon the top N, of anything really, not just search results but colleges, restaurants, or a host of other goods. These are the things that get the most attention and become de facto defaults because they are easier for us to access. In fact we rank all manner of services around us in our communities: schools, hospitals and doctors, even entire neighborhoods. Bloomberg has an entire site dedicated to them. These rankings have implications for a host of decisions we routinely make. Can we trust them to guide us?

Thirty years ago, rankings in the airline reservation systems used by travel agents were regulated by the U.S. government. Such regulation served to limit the ability of operators to “bias travel-agency displays” in a way that would privilege some flights over others. But this regulatory model for reigning in algorithmic power hasn’t been applied in other domains, like search engines. It’s worth asking why not and what that regulation might look like, but it’s also worth thinking about alternatives to regulation that we might employ for mitigating such biases. For instance we might design advanced interfaces that transparently signal the various ways in which a rank and the scores and indices on which it is built are constituted.

Consider an example from the local media, the “Best Neighborhoods” app, published by the Dallas Morning News (shown below). It ranks various neighborhoods according to criteria like the schools, parks, commute, and walkability. The default ranking of “overall” though is unclear: How are these various criteria weighted? And how are the various criteria even defined? What does “walkability” mean in the context of this app? If I am looking to invest in property I might be misled by a simplified algorithm; does it really measure the dimensions that are of most importance? While we can interactively re-rank by any of the individual criteria, many people will only see the default ranking anyway. Other neighborhood rankings, like the one from the New Yorker in 2010, do show the weights, but they’re non-interactive.

neighborhoods

The notion of algorithmic accountability is something I’ve written about here previously. It’s the idea that algorithms are becoming more and more powerful arbiters of our decision making, both in the corporate world and in government. There’s an increasing need for journalists to think critically about how to apply algorithmic accountability to the various rankings that the public encounters in society, including rankings (like neighborhood rankings) that their own news organizations may publish as community resources.

What should the interface be for an online ranking so that it provides a level of transparency to the public? In a recent project with the IEEE, we sought to implement an interface for end-users to interactively re-weight and visualize how their re-weightings affected a ranking. But this is just the start: there is exciting work to do in human-computer interaction and visualization design to determine the most effective ways to expose rankings interactively in ways that are useful to the public, but which also build credibility. How else might we visualize the entire space of weightings and how they affect a ranking in a way that helps the public understand the robustness of those rankings?

When we start thinking about the hegemony of algorithms and their ability to generalize nationally or internationally there are also interesting questions about how to adapt rankings for local communities. Take something like a local school ranking. Rankings by national or state aggregators like GreatSchools may be useful, but they may not reflect how an individual community would choose to weight or even select criteria for inclusion in a ranking. How might we adapt interfaces or rankings so that they can be more responsive to local communities? Are there geographically local feedback processes than might allow rankings to reflect community values? How might we enable democracy or even voting on local ranking algorithms?

In short, this is a call for more reflection on how to be transparent about the data-driven rankings we create for our readers online. There are research challenges here, in human-centered design, in visualization, and in decision sciences that if solved will allow us to build better and more trustworthy experiences for the public served by our journalism. It’s time to break the tyranny of the unequivocal ranking and develop new modes of transparency for these algorithms.

Diversity in the Robot Reporter Newsroom

bots

The Associated Press recently announced a big new hire: A robot reporter from Automated Insights (AI) would be employed to write up to 4,400 earnings report stories per quarter. Last year, that same automated writing software produced over 300 million stories — that’s some serious scale from a single algorithmic entity.

So what happens to media diversity in the face of massive automated content production platforms like the one Automated Insights created? Despite the fact that we’ve done pretty abysmally at incorporating a balance of minority and gender perspectives in the news media, I think we’d all like to believe that by including diverse perspectives in the reporting and editing of news we fly closer to the truth. A silver lining to the newspaper industry crash has been a profusion of smaller, more nimble media outlets, allowing for far more variability and diversity in the ideas that we’re exposed to.

Of course software has biases and although the basic anatomy of robot journalists is comparable, there are variations within and amongst different systems such as the style and tone that’s produced as well as the editorial criteria that are coded into the systems. Algorithms are the product of a range of human choices including various criteria, parameters, or training data that can also pass along inherited, systematic biases. So while a robot reporter offers the promise of scale (and of reducing costs), we need to consider an over-reliance on any one single automated system. For the sake of media diversity the one bot needs to fork itself and become 100,000.

We saw this in microcosm unfold over the last week. The @wikiparliament bot was launched in the UK to monitor edits to Wikipedia from IP addresses within parliament (a form of transparency and accountability for who was editing what). Within days it had been mimed by the @congressedits bot which was set up to monitor the U.S. Congress. What was particularly interesting about @congressedits though is that it was open sourced by creator Ed Summers. And that allowed the bot to quickly spread and be adapted for different jurisdictions like Australia, Canada, France, Sweden, Chile, Germany, and even Russia.

Tailoring a bot for different countries is just one (relatively simple) form of adaptation, but I think diversifying bots for different editorial perspectives could similarly benefit from a platform. I would propose that we need to build an open-source news bot architecture that different news and journalistic organizations could use as a scaffolding to encode their own editorial intents, newsworthiness criteria, parameters, data sets, ranking algorithms, cultures, and souls into. By creating a flexible platform as an underlying starting point, the automated media ecology could adapt and diversify faster and into new domains or applications.

Such a platform would also enable the expansion of bots oriented towards different journalistic tasks. A lot of the news and information bots you find on social media these days are parrots of various ilks: they aggregate content on a particular topical niche, like @BadBluePrep@FintechBot and @CelebNewsBot or for a geographical area like @North_GA, or they simply retweet other accounts based on some trigger words. Some of the more sophisticated bots do look at data feeds to generate novel insights, like @treasuryio or @mediagalleries, but there’s so much more that could be done if we had a flexible bot platform.

For instance we might consider building bots that act as information collectors and solicitors, moving away from pure content production to content acquisition. This isn’t so far off really. Researchers at IBM have been working on this for a couple years already and have already build a prototype system that “automatically identifies and ask[s] targeted strangers on Twitter for desired information.” The technology is oriented towards collecting accurate and up-to-date information from specific situations where crowd information may be valuable. It’s relatively easy to imagine an automated news bot being launched after a major news event to identify and solicit information, facts, or photos from people most likely nearby or involved in the event. In another related project the same group at IBM has been developing technology to identify people on Twitter that are more likely to propagate (Read: Retweet) information relating to public safety news alerts. Essentially they grease the gears of social dissemination by identifying just the right people for a given topic and at a particular time who are most likely to further share the information.

There are tons of applications for news bots just waiting for journalists to build them: factchecking, information gathering, network bridging, audience development etc. etc. Robot journalists don’t just have to be reporters. They can be editors, or even (hush) work on the business side.

What I think we don’t want to end up with is the Facebook or Google of robot reporting: “one algorithm to rule them all”. It’s great that the Associated Press is exploring the use of these technologies to scale up their content creation, but down the line when the use of writing algorithms extends far beyond earnings reports, utilizing only one platform may ultimately lead to homogenization and frustrate attempts to build a diverse media sphere. Instead the world that we need to actively create is one where there are thousands of artisanal news bots serving communities and variegated audiences, each crafted to fit a particular context and perhaps with a unique editorial intent. Having an open source platform would help enable that, and offer possibilities to plug in and explore a host of new applications for bots as well.

The Anatomy of a Robot Journalist

Note: A version of the following also appears on the Tow Center blog.

Given that an entire afternoon was dedicated to a “Robot Journalism Bootcamp” at the Global Editors Network Summit this week, it’s probably safe to say that automated journalism has finally gone mainstream — hey it’s only taken close to 40 years since the first story writing algorithm was created at Yale. But there are still lots of ethical questions and debates that we need to sort out, from source transparency to corrections policies for bots. Part of that hinges on exactly how these auto-writing algorithms work: What are their limitations and how might we design them to be more value-sensitive to journalism?

Despite the proprietary nature of most robot journalists, the great thing about patents is that they’re public. And patents have been granted to several major players in the robo-journalism space already, including Narrative Science, Automated Insights, and Yseop, making their algorithms just a little bit less opaque in terms of how they operate. More patents are in the pipeline from both heavy weights like CBS Interactive, and start-ups like Fantasy Journalist. So how does a robo-writer from Narrative Science really work?

Every robot journalist first needs to ingest a bunch of data. Data rich domains like weather were some of the first to have practical natural language generation systems. Now we’re seeing a lot of robot journalism applied to sports and finance — domains where the data can be standardized and made fairly clean. The development of sensor journalism may provide entirely new troves of data for producing automated stories. Key here is having clean and comprehensive data, so if you’re working in a domain that’s still stuck with PDFs or sparse access, the robots haven’t gotten there yet.

After data is read in by the algorithm the next step is to compute interesting or newsworthy features from the data. Basically the algorithm is trying to figure out the most critical aspects of an event, like a sports game. It has newsworthiness criteria built into its statistics. So for example, it looks for surprising statistical deviations like minimums, maximums, or outliers, big swings and changes in a value, violations of an expectation, a threshold being crossed, or a substantial change in a predictive model. “Any feature the value of which deviates significantly from prior expectation, whether the source of that expectation is due to a local computation or from an external source, is interesting by virtue of that deviation from expectation,” the Narrative Science patent reads. So for a baseball game the algorithm computes “win probability” after every play. If win probability has a big delta in-between two plays it probably means something important just happened and the algorithm puts that on a list of events that might be worthy of inclusion in the final story.

Once some interesting features have been identified, angles are then selected from a pre-authored library. Angles are explanatory or narrative structures that provide coherence to the overall story. Basically they are patterns of events, circumstances, entities, and their features. An angle for a sports story might be “back-and-forth horserace”, “heroic individual performance”, “strong team effort”, or “came out of a slump”. Certain angles are triggered according to the presence of certain derived features (from the previous step). Each angle is given an importance value from 1 to 10 which is then used to rank that angle against all of the other proposed angles.

Once the angles have been determined and ordered they are linked to specific story points, which connect back to individual pieces of data like names of players or specific numeric values like score. Story points can also be chosen and prioritized to account for personal interests such as home team players. These points can then be augmented with additional factual content drawn from internet databases such as where a player is from, or a quote or picture of them.

The last step the robot journalist takes is natural language generation, which for the Narrative Science system is done by recursively traversing all of the angle and story point representations and using phrasal generation routines to generate and splice together the actual English text. This is probably by far the most straightforward aspect of the entire pipeline — it’s pretty much just fancy templates.

So, there you have it, the pipeline for a robot journalist: (1) ingest data, (2) compute newsworthy aspects of the data, (3) identify relevant angles and prioritize them, (4) link angles to story points, and (5) generate the output text.

Obviously there can be variations to this basic pipeline as well. Automated insights for example uses randomization to provide variability in output stories and also incorporates a more sophisticated use of narrative tones that can be used to generate text. Based on a desired tone, different text might be generated to adhere to an apathetic, confident, pessimistic, or enthusiastic tone. YSeop on the other hand uses techniques for augmenting templates with metadata so that they’re more flexible. This allows templates to for instance conjugate verbs depending on the data being used. A post generation analyzer (you might call it a robot editor) from YSeop further improves the style of a written text by looking for repeated words and substituting synonyms or alternate words.

From my reading, I’d have to say that the Narrative Science patent seems to be the most informed by journalism. It stresses the notion of newsworthiness and editorial in crafting a narrative. But that’s not to say that the stylistic innovations from Automated Insights, and template flexibility of YSeop aren’t important. What still seems to be lacking though is a broader sense of newsworthiness besides “deviance” in these algorithms. Harcup and O’Neill identified 10 modern newsworthiness values, each of which we might make an attempt at mimicking in code: reference to the power elite, reference to celebrities, entertainment, surprise, bad news, good news, magnitude (i.e. significance to a large number of people), cultural relevance to audience, follow-up, and newspaper agenda. How might robot journalists evolve when they have a fuller palette of editorial intents available to them?

Computational Journalism and The Reporting of Algorithms

Note: A version of the following also appears on the Tow Center blog.

Software and algorithms have come to adjudicate an ever broader swath of our lives, including everything from search engine personalization and advertising systems, to teacher evaluation, banking and finance, political campaigns, and police surveillance.  But these algorithms can make mistakes. They have biases. Yet they sit in opaque black boxes, their inner workings, their inner “thoughts” hidden behind layers of complexity.

We need to get inside that black box, to understand how they may be exerting power on us, and to understand where they might be making unjust mistakes. Traditionally, investigative journalists have helped hold powerful actors in business or government accountable. But today, algorithms, driven by vast troves of data, have become the new power brokers in society. And the automated decisions of algorithms deserve every bit as much scrutiny as other powerful and influential actors.

Today the Tow Center publishes a new Tow/Knight Brief, “Algorithmic Accountability Reporting: On the Investigation of Black Boxes” to start tackling this issue. The Tow/Knight Brief presents motivating questions for why algorithms are worthy of our investigations, and develops a theory and method based on the idea of reverse engineering that can help parse how algorithms work. While reverse engineering shows promise as a method, it will also require the dedicated investigative talents of journalists interviewing algorithms’ creators as well. Algorithms are, after all, manifestations of human design.

If you’re in NYC next week, folks from the New York Times R&D lab are pushing the idea forward in their Impulse Response Workshop. And if you’re at IRE and NICAR’s 2014 CAR Conference in Baltimore on Feb 28th, I’ll be joined by Chase Davis, Frank Pasquale, and Jeremy Singer-Vine for an in-depth discussion on holding algorithms accountable. In the mean time, have a read of the paper, and let me know your thoughts, comments, and critiques.

Storytelling with Data Visualization: Context is King

Note: A version of the following also appears on the Tow Center blog.

Data is like a freeze-dried version of reality, abstracted sometimes to the point where it can be hard to recognize and understand. It needs some rehydration before it becomes tasty (or even just palatable) storytelling material again — something that visualization can often help with. But to fully breathe life back into your data, you need to crack your knuckles and add a dose of written explanation to your visualizations as well. Text provides that vital bit of context layered over the data that helps the audience come to a valid interpretation of what it really means.

So how can you use text and visualization together to provide that context and layer a story over your data? Some recently published research by myself and collaborators at the University of Michigan offers some insights.

In most journalistic visualization, context is added to data visualization through the use of labels, captions, and other annotations — texts — of various kinds. Indeed, on the Economist Graphic Detail blog, visualizations not only have integrated textual annotations, but an entire 1-2 paragraph introductory article associated with them. In addition to adding an angle and story to the piece, such contextual journalism helps flesh out what the data means and guides the reader’s interpretation towards valid inferences from the data. Textual annotations integrated directly with a visualization can further guide the users’ interactions, emphasizing certain points, prioritizing particular interpretations of data, or pre-empting the user’s curiosity on seeing a salient outlier, aberration, or trend.

To answer the question of how textual annotations function as story contextualizers in online news visualization we analyzed 136 professionally made news visualizations produced by the New York Times and the Guardian between 2000 and July 2012. Of course we found text used for everything from axes labels, author information, sources, and data provenance, to instructions, definitions, and legends, but we were were less interested in studying these kinds of uses than in annotations that were more related to data storytelling.

Based on our analysis we recognized two underlying functions for annotations: (1) observational, and (2) additive. Observational annotations provide context by supporting reflection on a data value or group of values that are depicted in the visualization. These annotations facilitate comparisons and often highlight or emphasize extreme values or other outliers. For interactive graphics they are sometimes revealed when hovering over a visual element.

A basic form of observational messaging is apparent in the following example from the New York Times, showing the population pyramid in the U.S. On the right of the graphic text clearly indicates observations of the total number and fraction of the population expected to be over age 65 by 2015. This is information that can be observed in the graph but is being reinforced through the use of text.

Another example from the Times shows how observational annotations can be used to highlight and label extremes on a graph. In the chart below, the U.S. budget forecast is depicted, and the low point of 2010 is highlighted with a yellow circle together with an annotation. The value and year of that point are already visible in the graph, which is what makes this kind of annotation observational. Consider using observational annotations when you want to underscore something that’s visible in the visualization, but which you really want to make sure the user sees, or when there is an interesting comparison that you would like to draw the user’s attention towards.

On the other hand, additive annotation provides context that is external to the visual representation and not clearly depicted via the data. These are things that are relevant to the topic or to understanding the data, like background or contemporaneous events or actions. It’s up to you to decide which dimensions of who, what, where, when, why, and how are relevant. If you think the viewer needs to be aware of something in order to interpret the data correctly, then an additive annotation might be appropriate.

The following example from The Minneapolis Star Tribune shows changes in home prices across counties in Minnesota with reference to the peak of the housing bubble, a key bit of additive annotation attached to the year 2007. At the same time, the graphic also uses observational annotation (on the right side) by labeling the median home price and percent change since 2007 for the selected county.

Use of these types of annotation is very prevalent; in our study of 136 examples we found 120 (88.2%) used at least one of these forms of annotation. We also looked at the relative use of each, shown in the next figure. Observational annotations were used in just shy of half of the cases, whereas additive were used in 73%.

Another dimension to annotation is what scope of the visualization is being referenced: an individual datum, a group of data, or the entire view (e.g. a caption-like element). We tabulated the prevalence of these annotation anchors and found that single datum annotations are the most frequently used (74%). The relative usage frequencies are shown in the next figure. Your choice of what scope of the visualization to annotate will often depend on the story you want to tell, or on what kinds of visual features are most visually salient, such as outliers, trends, or peaks. For instance, trends that happen over longer time-frames in a line-graph might benefit from a group annotation to indicate how a collection of data points is trending, whereas a peak in a time-series would most obviously benefit from an annotation calling out that specific data point.

The two types of annotation, and three types of annotation anchoring are summarized in the following chart depicting stock price data for Apple. Annotations A1 and A2 show additive annotations attached to the whole view, and to a specific date in the view, whereas O1 and O2 show observational annotations attached to a single datum and a group of data respectively.

As we come to better understand how to tell stories with text and visualization together, new possibilities also open up for how to integrate text computationally or automatically with visualization.

In our research we used the above insights about how annotations are used by professionals to build a system that analyzes a stock time series (together with its trade volume data) to look for salient points and automatically annotate the series with key bits of additive context drawn from a corpus of news articles. By ranking relevant news headlines and then deriving graph annotations we were able to automatically generate contextualized stock charts and create a user-experience where users felt they had a better grasp of the trends and oscillations of the stock.

On one hand we have the fully automated scenario, but in the future, more intelligent graph authoring tools for journalists might also incorporate such automation to suggest possible annotations for a graph, which an editor could then tweak or re-write before publication. So not only can the study of news visualizations help us understand the medium better and communicate more effectively, but it can also enable new forms of computational journalism to emerge. For all the details please see our research paper, “Contextifier: Automatic Generation of Annotated Stock Visualizations.”

Algorithmic Defamation: The Case of the Shameless Autocomplete

Note: A version of the following also appears on the Tow Center blog.

In Germany, a man recently won a legal battle with Google over the fact that when you searched for his name, the autocomplete suggestions connected him to “scientology” and “fraud,” — two things that he felt had defamatory insinuations. As a result of losing the case, Google is now compelled to remove defamatory suggestions from autocomplete results when notified, in Germany at least.

Court cases arising from autocomplete defamation aren’t just happening in Germany though. In other European countries like Italy, France, and Ireland, to as wide afield as Japan and Australia people (and corporations) have brought suit alleging these algorithms defamed them by linking their names to everything from crime and fraud to bankruptcy or sexual conduct. In some cases such insinuations can have real consequences for finding jobs or doing business. New services, such as brand.com’s “Google Suggest Plan” have even arisen to help people manipulate and thus avoid negative connotations in search autocompletions.

The Berkman Center’s Digital Media Law Project (DMLP) defines a defamatory statement generally as, “a false statement of fact that exposes a person to hatred, ridicule or contempt, lowers him in the esteem of his peers, causes him to be shunned, or injures him in his business or trade.” By associating a person’s name with some unsavory behavior it would seem indisputable that autocomplete algorithms can indeed defame people.

So if algorithms like autocomplete can defame people or businesses, our next logical question might be to ask how to hold those algorithms accountable for their actions. Considering the scale and difficulty of monitoring such algorithms, one approach would be to use more algorithms to keep tabs on them and try to find instances of defamation hidden within their millions (or billions) of suggestions.

To try out this approach I automatically collected data on both Google and Bing autocompletions for a number of different queries relating to public companies and politicians. I then filtered these results against keyword lists relating to crime and sex in order to narrow in on potential cases of defamation. I used a list of the corporations on the S&P 500 to query the autocomplete APIs with the following templates, where “X” is the company name: “X,” “X company,” “X is,” “X has,” “X company is,” and “X company has.” And I used a list of U.S. congresspeople from the Sunlight Foundation to query for each person’s first and last name, as well as adding either “representative” or “senator” before their name. The data was then filtered using a list of sex-related keywords, and words related to crime collected from the Cambridge US dictionary in order to focus on a smaller subset of the almost 80,000 autosuggestions retrieved.

Among the corporate autocompletions that I filtered and reviewed, there were twenty-four instances that could be read as statements or assertions implicating the company in everything from corruption and scams to fraud and theft. For instance, querying Bing for “Torchmark” returns as the second suggestion, “torchmark corporation job scam.” Without really digging deeply it’s hard to tell if Torchmark corporation is really involved in some form of scam, or if there’s just some rumors about scam-like emails floating around. If those rumors are false, this could indeed be a case of defamation against the company. But this is a dicey situation for Bing, since if they filtered out a rumor that turned out to be true it might appear they were trying to sweep a company’s unsavory activities under the rug. People would ask: Is Bing trying to protect this company? At the same time they would be doing a disservice to their users by not steering them clear of a scam.

While looking through the autocompletions returned from querying for congresspeople it became clear that a significant issue here relates to name collisions. For relatively generic congressperson names like “Gerald Connolly” or “Joe Barton” there are many other people on the internet with the same names. And some of those people did bad things. So when you Google for “Gerald Connolly” one suggestion that comes up is “gerald connolly armed robbery,” not because Congressman Gerald Connolly robbed anyone but because someone else in Canada by the same name did. If you instead query for “representative Gerald Connolly” the association goes away; adding “representative” successfully disambiguates the two Connollys. The search engine has it tough though: Without a disambiguating term how should it know you’re looking for the congressman or a robber? There are other cases that may be more clear-cut instances of defamation, such as on Bing “Joe Barton” suggesting “joe barton scam” which was not corrected when adding the title “representative” to the front of the query. That seems to be more of a legitimate instance of defamation since even with the disambiguation it’s still suggesting the representative is associated with a scam. And with a bit more searching around it’s also clear there is a scam related to a Joe Barton, just not the congressman.

Some of the unsavory things that might hurt someone’s reputation in autocomplete suggestions could be true though. For instance, an autocompletion for representative “Darrell Issa” to “Darrell Issa car theft” is a correct association arising from his involvement with three separate car theft cases (for which his brother ultimately took the rap). To be considered defamation, the statement must actually be false, which makes it that much harder to write an algorithm that can find instances of real defamation. Unless algorithms can be developed that can detect rumor and falsehood, you’ll always need a person assessing whether an instance of potential defamation is really valid. Still, such tips on what might be defamatory can help filter and focus attention.

Understanding defamation from a legal standpoint brings in even more complexity. Even something that seems, from a moral point of view, defamatory might not be considered so by a court of law. Each state in the U.S. is a bit different in how it governs defamation. A few key nuances relevant to the court’s understanding of defamation relate to perception and intent.

First of all, a statement must be perceived as fact and not opinion in order to be considered defamation by the court. So how do people read search autocompletions? Do they see them as collective opinions or rumors reflecting the zeitgeist, or do they perceive them as statements of fact because of their framing as a result from an algorithm? As far as I know this is an open question for research. If autocompletions are read as opinion, then it might be difficult to ever win a defamation case in the U.S. against such an algorithm.

For defamation suits against public figures intent also becomes an important factor to consider. The plaintiff must prove “actual malice” with regards to the defamatory statement, which means that a false statement was published either with actual knowledge of its falsity, or reckless disregard for its falsity. But can an algorithm ever be truly malicious? If you use the argument that autocompletions are just aggregations of what others have already typed in, then actual malice could certainly arise from a group of people systematically manipulating the algorithm. Otherwise, the algorithm would have to have some notion of truth, and be “aware” that it was autocompleting something inconsistent with its knowledge of that truth. This could be especially challenging for things who’s truth changes over time, or for rumors which may have a social consensus but still be objectively false. So while there have been attempts at automating factchecking I think this is a far way off.

Of course this may all be moot under Section 230 of the Communications Decency Act, which states that, “no provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.” Given that search autocompletions are based on queries that real people at one time typed into a search box, it would seem Google has a broad protection under the law against any liability from republishing those queries as suggestions. It’s unclear though, at least to me, if recombining and aggregating data from millions of typed queries can really be considered “re-publishing” or if it should rather be considered publishing anew. I suppose it would depend on the degree of transformation of the input query data into suggestions.

Whether it’s Google’s algorithms creating new snippets of text as autocomplete suggestions, or Narrative Science writing entire articles from data, we’re entering a world where algorithms are synthesizing communications that may in some cases run into moral (or legal) considerations like defamation. In print we call defamation libel; when orally communicated we call it slander. We don’t yet have a word for the algorithmically reconstituted defamation that arises when millions of non-public queries are synthesized and publicly published by an aggregative intermediary. Still, we might try to hold such algorithms to account, by using yet more algorithms to systematically assess and draw human attention to possible breaches of trust. It may be some time yet, if ever, when we can look to the U.S. court system for adjudication.

51% Foreign: Algorithms and the Surveillance State

In New York City there’s a “geek squad” of analysts that gathers all kinds of data, from restaurant inspection grades and utility usage to neighborhood complaints, and uses it to predict how to improve the city. The idea behind the team is that with more and more data available about how the city is running—even if it’s messy, unstructured, and massive—the government can optimize its resources by keeping an eye out for what needs its attention most. It’s really about city surveillance, and of course acting on the intelligence produced by that surveillance.

One story about the success of the geek squad comes to us from Viktor Mayer-Schonberger and Kenneth Cukier in their book “Big Data”. They describe the issue of illegal real-estate conversions, which involves sub-dividing an apartment into smaller and smaller units so that it can accommodate many more people than it should. With the density of people in such close quarters, illegally converted units are more prone to accidents, like fire. So it’s in the city’s—and the public’s—best interest to make sure apartment buildings aren’t sub-divided like that. Unfortunately there aren’t very many inspectors to do the job. But by collecting and analyzing data about each apartment building the geek squad can predict which units are more likely to pose a danger, and thus determine where the limited number of inspectors should focus their attention. Seventy percent of inspections now lead to eviction orders from unsafe dwellings, up from 13% without using all that data—a clear improvement in helping inspectors focus on the most troubling cases.

Consider a different, albeit hypothetical, use of big data surveillance in society: detecting drunk drivers. Since there are already a variety of road cameras and other traffic sensors available on our roads, it’s not implausible to think that all of this data could feed into an algorithm that says, with some confidence, that a car is exhibiting signs of erratic, possibly drunk driving. Let’s say, similar to the fire-risk inspections, that this method also increases the efficiency of the police department in getting drunk drivers off the road—a win for public safety.

But there’s a different framing at work here. In the fire-risk inspections the city is targeting buildings, whereas in the drunk driving example it’s really targeting the drivers themselves. This shift in framing—targeting the individual as opposed to the inanimate–crosses the line into invasive, even creepy, civil surveillance.

So given the degree to which the recently exposed government surveillance programs target individual communications, it’s not as surprising that, according to Gallup, more Americans disapprove (53%) than approve (37%) of the federal government’s program to “compile telephone call logs and Internet communications.” This is despite the fact that such surveillance could in a very real way contribute to public safety, just as with the fire-risk or drunk driving inspections.

At the heart of the public’s psychological response is the fear and risk of surveillance uncovering personal communication, of violating our privacy. But this risk is not a foregone conclusion. There’s some uncertainty and probability around it, which makes it that much harder to understand the real risk. In the Prism program, the government surveillance program that targets internet communications like email, chats, and file transfers, the Washington Post describes how analysts use the system to “produce at least 51 percent confidence in a target’s ‘foreignness’”. This test of foreignness is tied to the idea that it’s okay (legally) to spy on foreign communications, but that it would breach FISA (the Foreign Intelligence Surveillance Act), as well as 4th amendment rights for the government to do the same to American citizens.

Platforms used by Prism, such as Google and Facebook, have denied that they give the government direct access to their servers. The New York Times reported that the system in place is more like having a locked mailbox where the platform can deposit specific data requested pursuant to a court order from the Foreign Intelligence Surveillance Court. But even if such requests are legally targeted at foreigners and have been faithfully vetted by the court, there’s still a chance that ancillary data on American citizens will be swept up by the government. “To collect on a suspected spy or foreign terrorist means, at minimum, that everyone in the suspect’s inbox or outbox is swept in,” as the Washington Post writes. And typically data is collected not just of direct contacts, but also contacts of contacts. This all means that there’s a greater risk that the government is indeed collecting data on many Americans’ personal communications.

Algorithms, and a bit of transparency on those algorithms, could go a long way to mitigating the uneasiness over domestic surveillance of personal communications that American citizens may be feeling. The basic idea is this: when collecting information on a legally identified foreign target, for every possible contact that might be swept up with the target’s data, an automated classification algorithm can be used to determine whether that contact is more likely to be “foreign” or “American”. Although the algorithm would have access to all the data, it would only output one bit of metadata for each contact: is the contact foreign or not? Only if the contact was deemed highly likely to be foreign would the details of that data be passed on to the NSA. In other words, the algorithm would automatically read your personal communications and then signal whether or not it was legal to report your data to intelligence agencies, much in the same way that Google’s algorithms monitor your email contents to determine which ads to show you without making those emails available for people at Google to read.

The FISA court implements a “minimization procedure” in order to curtail incidental data collection from people not covered in the order, though the exact process remains classified. Marc Ambinder suggests that, “the NSA automates the minimization procedures as much as it can” using a continuously updated score that assesses the likelihood that a contact is foreign.  Indeed, it seems at least plausible that the algorithm I suggest above could already be a part of the actual minimization procedure used by NSA.

The minimization process reduces the creepiness of unfettered government access to personal communications, but at the same time we still need to know how often such a procedure makes mistakes. In general there are two kinds of mistakes that such an algorithm could make, often referred to as false positives and false negatives. A false negative in this scenario would indicate that a foreign contact was categorized by the algorithm as an American. Obviously the NSA would like to avoid this type of mistake since it would lose the opportunity to snoop on a foreign terrorist. The other type of mistake, false positive, corresponds to the algorithm designating a contact as foreign even though in reality it’s American. The public would want to avoid this type of mistake because it’s an invasion of privacy and a violation of the 4th amendment. Both of these types of errors are shown in the conceptual diagram below, with the foreign target marked with an “x” at the center and ancillary targets shown as connected circles (orange is foreign, blue is American citizen).

diagram

It would be a shame to disregard such a potentially valuable tool simply because it might make mistakes from time to time. To make such a scheme work we first need to accept that the algorithm will indeed make mistakes. Luckily, such an algorithm can be tuned to make more or less of either of those mistakes. As false positives are tuned down false negatives will often increase, and vice versa. The advantage for the public would be that it could have a real debate with the government about what magnitude of mistakes is reasonable. How many Americans being labeled as foreigners and thus subject to unwarranted search and seizure is acceptable to us? None? Some? And what’s the trade-off in terms of how many would-be terrorists might slip through if we tuned the false positives down?

To begin a debate like this the government just needs to tell us how many of each type of mistake its minimization procedure makes; just two numbers. In this case, minimal transparency of an algorithm could allow for a robust public debate without betraying any particular details or secrets about individuals. In other words, we don’t particularly need to know the gory details of how such an algorithm works. We simply need to know where the government has placed the fulcrum in the tradeoff between these different types of errors. And by implementing smartly transparent surveillance maybe we can even move more towards the world of the geek squad, where big data is still ballyhooed for furthering public safety.

Neolithic Journalists? Influence Engines? Narrative Analytics? Some Thoughts on C+J

A few weeks ago now was the 2nd Computation + Journalism Symposium at Georgia Tech, which I helped organize and program. I wrote up a few reflections on things that jumped out at me from the meeting. Check them out on Nieman Lab.

Aha! Brainstorming App

In April 2012 I published a whitepaper on Cultivating Innovation in Computational Journalism with the CUNY Tow-Knight Center for Entrepreneurial Journalism. Jeff Jarvis wrote about it on the Tow-Knight blog, and the Nieman Lab even covered it.

Part of the paper developed a structured brainstorming activity called “Aha!” to help students and news industry professionals in thinking more about ways to combine ideas from technology, information science, user needs, and journalistic goals into useful new news products and services. We produced a printed deck of cards with different concepts that people could re-combine, and you can still get these cards from CUNY.

But really the Aha! Brainstorming activity was begging to be made into an app, which is now available on the Apple App Store. The app has the advantages that you can augment the re-combinable concepts, you can audio record your brainstorming sessions, take and store photos of any notes you scribble down about your ideas, and share the whole thing via email with your colleagues. If you have an iDevice be sure to check it out!

Understanding bias in computational news media

Just a quick pointer to an article I wrote for Nieman Lab exploring some of the ways in which algorithms serve to introduce bias into news media. Different kind of writing than my typical academic-ese, but fun.