Algorithmic Defamation: The Case of the Shameless Autocomplete

Note: A version of the following also appears on the Tow Center blog.

In Germany, a man recently won a legal battle with Google over the fact that when you searched for his name, the autocomplete suggestions connected him to “scientology” and “fraud,” — two things that he felt had defamatory insinuations. As a result of losing the case, Google is now compelled to remove defamatory suggestions from autocomplete results when notified, in Germany at least.

Court cases arising from autocomplete defamation aren’t just happening in Germany though. In other European countries like Italy, France, and Ireland, to as wide afield as Japan and Australia people (and corporations) have brought suit alleging these algorithms defamed them by linking their names to everything from crime and fraud to bankruptcy or sexual conduct. In some cases such insinuations can have real consequences for finding jobs or doing business. New services, such as brand.com’s “Google Suggest Plan” have even arisen to help people manipulate and thus avoid negative connotations in search autocompletions.

The Berkman Center’s Digital Media Law Project (DMLP) defines a defamatory statement generally as, “a false statement of fact that exposes a person to hatred, ridicule or contempt, lowers him in the esteem of his peers, causes him to be shunned, or injures him in his business or trade.” By associating a person’s name with some unsavory behavior it would seem indisputable that autocomplete algorithms can indeed defame people.

So if algorithms like autocomplete can defame people or businesses, our next logical question might be to ask how to hold those algorithms accountable for their actions. Considering the scale and difficulty of monitoring such algorithms, one approach would be to use more algorithms to keep tabs on them and try to find instances of defamation hidden within their millions (or billions) of suggestions.

To try out this approach I automatically collected data on both Google and Bing autocompletions for a number of different queries relating to public companies and politicians. I then filtered these results against keyword lists relating to crime and sex in order to narrow in on potential cases of defamation. I used a list of the corporations on the S&P 500 to query the autocomplete APIs with the following templates, where “X” is the company name: “X,” “X company,” “X is,” “X has,” “X company is,” and “X company has.” And I used a list of U.S. congresspeople from the Sunlight Foundation to query for each person’s first and last name, as well as adding either “representative” or “senator” before their name. The data was then filtered using a list of sex-related keywords, and words related to crime collected from the Cambridge US dictionary in order to focus on a smaller subset of the almost 80,000 autosuggestions retrieved.

Among the corporate autocompletions that I filtered and reviewed, there were twenty-four instances that could be read as statements or assertions implicating the company in everything from corruption and scams to fraud and theft. For instance, querying Bing for “Torchmark” returns as the second suggestion, “torchmark corporation job scam.” Without really digging deeply it’s hard to tell if Torchmark corporation is really involved in some form of scam, or if there’s just some rumors about scam-like emails floating around. If those rumors are false, this could indeed be a case of defamation against the company. But this is a dicey situation for Bing, since if they filtered out a rumor that turned out to be true it might appear they were trying to sweep a company’s unsavory activities under the rug. People would ask: Is Bing trying to protect this company? At the same time they would be doing a disservice to their users by not steering them clear of a scam.

While looking through the autocompletions returned from querying for congresspeople it became clear that a significant issue here relates to name collisions. For relatively generic congressperson names like “Gerald Connolly” or “Joe Barton” there are many other people on the internet with the same names. And some of those people did bad things. So when you Google for “Gerald Connolly” one suggestion that comes up is “gerald connolly armed robbery,” not because Congressman Gerald Connolly robbed anyone but because someone else in Canada by the same name did. If you instead query for “representative Gerald Connolly” the association goes away; adding “representative” successfully disambiguates the two Connollys. The search engine has it tough though: Without a disambiguating term how should it know you’re looking for the congressman or a robber? There are other cases that may be more clear-cut instances of defamation, such as on Bing “Joe Barton” suggesting “joe barton scam” which was not corrected when adding the title “representative” to the front of the query. That seems to be more of a legitimate instance of defamation since even with the disambiguation it’s still suggesting the representative is associated with a scam. And with a bit more searching around it’s also clear there is a scam related to a Joe Barton, just not the congressman.

Some of the unsavory things that might hurt someone’s reputation in autocomplete suggestions could be true though. For instance, an autocompletion for representative “Darrell Issa” to “Darrell Issa car theft” is a correct association arising from his involvement with three separate car theft cases (for which his brother ultimately took the rap). To be considered defamation, the statement must actually be false, which makes it that much harder to write an algorithm that can find instances of real defamation. Unless algorithms can be developed that can detect rumor and falsehood, you’ll always need a person assessing whether an instance of potential defamation is really valid. Still, such tips on what might be defamatory can help filter and focus attention.

Understanding defamation from a legal standpoint brings in even more complexity. Even something that seems, from a moral point of view, defamatory might not be considered so by a court of law. Each state in the U.S. is a bit different in how it governs defamation. A few key nuances relevant to the court’s understanding of defamation relate to perception and intent.

First of all, a statement must be perceived as fact and not opinion in order to be considered defamation by the court. So how do people read search autocompletions? Do they see them as collective opinions or rumors reflecting the zeitgeist, or do they perceive them as statements of fact because of their framing as a result from an algorithm? As far as I know this is an open question for research. If autocompletions are read as opinion, then it might be difficult to ever win a defamation case in the U.S. against such an algorithm.

For defamation suits against public figures intent also becomes an important factor to consider. The plaintiff must prove “actual malice” with regards to the defamatory statement, which means that a false statement was published either with actual knowledge of its falsity, or reckless disregard for its falsity. But can an algorithm ever be truly malicious? If you use the argument that autocompletions are just aggregations of what others have already typed in, then actual malice could certainly arise from a group of people systematically manipulating the algorithm. Otherwise, the algorithm would have to have some notion of truth, and be “aware” that it was autocompleting something inconsistent with its knowledge of that truth. This could be especially challenging for things who’s truth changes over time, or for rumors which may have a social consensus but still be objectively false. So while there have been attempts at automating factchecking I think this is a far way off.

Of course this may all be moot under Section 230 of the Communications Decency Act, which states that, “no provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.” Given that search autocompletions are based on queries that real people at one time typed into a search box, it would seem Google has a broad protection under the law against any liability from republishing those queries as suggestions. It’s unclear though, at least to me, if recombining and aggregating data from millions of typed queries can really be considered “re-publishing” or if it should rather be considered publishing anew. I suppose it would depend on the degree of transformation of the input query data into suggestions.

Whether it’s Google’s algorithms creating new snippets of text as autocomplete suggestions, or Narrative Science writing entire articles from data, we’re entering a world where algorithms are synthesizing communications that may in some cases run into moral (or legal) considerations like defamation. In print we call defamation libel; when orally communicated we call it slander. We don’t yet have a word for the algorithmically reconstituted defamation that arises when millions of non-public queries are synthesized and publicly published by an aggregative intermediary. Still, we might try to hold such algorithms to account, by using yet more algorithms to systematically assess and draw human attention to possible breaches of trust. It may be some time yet, if ever, when we can look to the U.S. court system for adjudication.