Tag Archives: algorithms

What’s in a Ranking?

The web is a tangled mess of pages and links. But through the magic of the Google algorithm it becomes a nice and neatly ordered rank of “relevance” to whatever our heart desires. The network may be the architecture of the web, but the human ideology projected on that network is the rank.

Often enough we take rankings at face value; we don’t stop to think about what’s really in a rank. There is tremendous power conferred upon the top N, of anything really, not just search results but colleges, restaurants, or a host of other goods. These are the things that get the most attention and become de facto defaults because they are easier for us to access. In fact we rank all manner of services around us in our communities: schools, hospitals and doctors, even entire neighborhoods. Bloomberg has an entire site dedicated to them. These rankings have implications for a host of decisions we routinely make. Can we trust them to guide us?

Thirty years ago, rankings in the airline reservation systems used by travel agents were regulated by the U.S. government. Such regulation served to limit the ability of operators to “bias travel-agency displays” in a way that would privilege some flights over others. But this regulatory model for reigning in algorithmic power hasn’t been applied in other domains, like search engines. It’s worth asking why not and what that regulation might look like, but it’s also worth thinking about alternatives to regulation that we might employ for mitigating such biases. For instance we might design advanced interfaces that transparently signal the various ways in which a rank and the scores and indices on which it is built are constituted.

Consider an example from the local media, the “Best Neighborhoods” app, published by the Dallas Morning News (shown below). It ranks various neighborhoods according to criteria like the schools, parks, commute, and walkability. The default ranking of “overall” though is unclear: How are these various criteria weighted? And how are the various criteria even defined? What does “walkability” mean in the context of this app? If I am looking to invest in property I might be misled by a simplified algorithm; does it really measure the dimensions that are of most importance? While we can interactively re-rank by any of the individual criteria, many people will only see the default ranking anyway. Other neighborhood rankings, like the one from the New Yorker in 2010, do show the weights, but they’re non-interactive.

neighborhoods

The notion of algorithmic accountability is something I’ve written about here previously. It’s the idea that algorithms are becoming more and more powerful arbiters of our decision making, both in the corporate world and in government. There’s an increasing need for journalists to think critically about how to apply algorithmic accountability to the various rankings that the public encounters in society, including rankings (like neighborhood rankings) that their own news organizations may publish as community resources.

What should the interface be for an online ranking so that it provides a level of transparency to the public? In a recent project with the IEEE, we sought to implement an interface for end-users to interactively re-weight and visualize how their re-weightings affected a ranking. But this is just the start: there is exciting work to do in human-computer interaction and visualization design to determine the most effective ways to expose rankings interactively in ways that are useful to the public, but which also build credibility. How else might we visualize the entire space of weightings and how they affect a ranking in a way that helps the public understand the robustness of those rankings?

When we start thinking about the hegemony of algorithms and their ability to generalize nationally or internationally there are also interesting questions about how to adapt rankings for local communities. Take something like a local school ranking. Rankings by national or state aggregators like GreatSchools may be useful, but they may not reflect how an individual community would choose to weight or even select criteria for inclusion in a ranking. How might we adapt interfaces or rankings so that they can be more responsive to local communities? Are there geographically local feedback processes than might allow rankings to reflect community values? How might we enable democracy or even voting on local ranking algorithms?

In short, this is a call for more reflection on how to be transparent about the data-driven rankings we create for our readers online. There are research challenges here, in human-centered design, in visualization, and in decision sciences that if solved will allow us to build better and more trustworthy experiences for the public served by our journalism. It’s time to break the tyranny of the unequivocal ranking and develop new modes of transparency for these algorithms.

Diversity in the Robot Reporter Newsroom

bots

The Associated Press recently announced a big new hire: A robot reporter from Automated Insights (AI) would be employed to write up to 4,400 earnings report stories per quarter. Last year, that same automated writing software produced over 300 million stories — that’s some serious scale from a single algorithmic entity.

So what happens to media diversity in the face of massive automated content production platforms like the one Automated Insights created? Despite the fact that we’ve done pretty abysmally at incorporating a balance of minority and gender perspectives in the news media, I think we’d all like to believe that by including diverse perspectives in the reporting and editing of news we fly closer to the truth. A silver lining to the newspaper industry crash has been a profusion of smaller, more nimble media outlets, allowing for far more variability and diversity in the ideas that we’re exposed to.

Of course software has biases and although the basic anatomy of robot journalists is comparable, there are variations within and amongst different systems such as the style and tone that’s produced as well as the editorial criteria that are coded into the systems. Algorithms are the product of a range of human choices including various criteria, parameters, or training data that can also pass along inherited, systematic biases. So while a robot reporter offers the promise of scale (and of reducing costs), we need to consider an over-reliance on any one single automated system. For the sake of media diversity the one bot needs to fork itself and become 100,000.

We saw this in microcosm unfold over the last week. The @wikiparliament bot was launched in the UK to monitor edits to Wikipedia from IP addresses within parliament (a form of transparency and accountability for who was editing what). Within days it had been mimed by the @congressedits bot which was set up to monitor the U.S. Congress. What was particularly interesting about @congressedits though is that it was open sourced by creator Ed Summers. And that allowed the bot to quickly spread and be adapted for different jurisdictions like Australia, Canada, France, Sweden, Chile, Germany, and even Russia.

Tailoring a bot for different countries is just one (relatively simple) form of adaptation, but I think diversifying bots for different editorial perspectives could similarly benefit from a platform. I would propose that we need to build an open-source news bot architecture that different news and journalistic organizations could use as a scaffolding to encode their own editorial intents, newsworthiness criteria, parameters, data sets, ranking algorithms, cultures, and souls into. By creating a flexible platform as an underlying starting point, the automated media ecology could adapt and diversify faster and into new domains or applications.

Such a platform would also enable the expansion of bots oriented towards different journalistic tasks. A lot of the news and information bots you find on social media these days are parrots of various ilks: they aggregate content on a particular topical niche, like @BadBluePrep@FintechBot and @CelebNewsBot or for a geographical area like @North_GA, or they simply retweet other accounts based on some trigger words. Some of the more sophisticated bots do look at data feeds to generate novel insights, like @treasuryio or @mediagalleries, but there’s so much more that could be done if we had a flexible bot platform.

For instance we might consider building bots that act as information collectors and solicitors, moving away from pure content production to content acquisition. This isn’t so far off really. Researchers at IBM have been working on this for a couple years already and have already build a prototype system that “automatically identifies and ask[s] targeted strangers on Twitter for desired information.” The technology is oriented towards collecting accurate and up-to-date information from specific situations where crowd information may be valuable. It’s relatively easy to imagine an automated news bot being launched after a major news event to identify and solicit information, facts, or photos from people most likely nearby or involved in the event. In another related project the same group at IBM has been developing technology to identify people on Twitter that are more likely to propagate (Read: Retweet) information relating to public safety news alerts. Essentially they grease the gears of social dissemination by identifying just the right people for a given topic and at a particular time who are most likely to further share the information.

There are tons of applications for news bots just waiting for journalists to build them: factchecking, information gathering, network bridging, audience development etc. etc. Robot journalists don’t just have to be reporters. They can be editors, or even (hush) work on the business side.

What I think we don’t want to end up with is the Facebook or Google of robot reporting: “one algorithm to rule them all”. It’s great that the Associated Press is exploring the use of these technologies to scale up their content creation, but down the line when the use of writing algorithms extends far beyond earnings reports, utilizing only one platform may ultimately lead to homogenization and frustrate attempts to build a diverse media sphere. Instead the world that we need to actively create is one where there are thousands of artisanal news bots serving communities and variegated audiences, each crafted to fit a particular context and perhaps with a unique editorial intent. Having an open source platform would help enable that, and offer possibilities to plug in and explore a host of new applications for bots as well.