51% Foreign: Algorithms and the Surveillance State

In New York City there’s a “geek squad” of analysts that gathers all kinds of data, from restaurant inspection grades and utility usage to neighborhood complaints, and uses it to predict how to improve the city. The idea behind the team is that with more and more data available about how the city is running—even if it’s messy, unstructured, and massive—the government can optimize its resources by keeping an eye out for what needs its attention most. It’s really about city surveillance, and of course acting on the intelligence produced by that surveillance.

One story about the success of the geek squad comes to us from Viktor Mayer-Schonberger and Kenneth Cukier in their book “Big Data”. They describe the issue of illegal real-estate conversions, which involves sub-dividing an apartment into smaller and smaller units so that it can accommodate many more people than it should. With the density of people in such close quarters, illegally converted units are more prone to accidents, like fire. So it’s in the city’s—and the public’s—best interest to make sure apartment buildings aren’t sub-divided like that. Unfortunately there aren’t very many inspectors to do the job. But by collecting and analyzing data about each apartment building the geek squad can predict which units are more likely to pose a danger, and thus determine where the limited number of inspectors should focus their attention. Seventy percent of inspections now lead to eviction orders from unsafe dwellings, up from 13% without using all that data—a clear improvement in helping inspectors focus on the most troubling cases.

Consider a different, albeit hypothetical, use of big data surveillance in society: detecting drunk drivers. Since there are already a variety of road cameras and other traffic sensors available on our roads, it’s not implausible to think that all of this data could feed into an algorithm that says, with some confidence, that a car is exhibiting signs of erratic, possibly drunk driving. Let’s say, similar to the fire-risk inspections, that this method also increases the efficiency of the police department in getting drunk drivers off the road—a win for public safety.

But there’s a different framing at work here. In the fire-risk inspections the city is targeting buildings, whereas in the drunk driving example it’s really targeting the drivers themselves. This shift in framing—targeting the individual as opposed to the inanimate–crosses the line into invasive, even creepy, civil surveillance.

So given the degree to which the recently exposed government surveillance programs target individual communications, it’s not as surprising that, according to Gallup, more Americans disapprove (53%) than approve (37%) of the federal government’s program to “compile telephone call logs and Internet communications.” This is despite the fact that such surveillance could in a very real way contribute to public safety, just as with the fire-risk or drunk driving inspections.

At the heart of the public’s psychological response is the fear and risk of surveillance uncovering personal communication, of violating our privacy. But this risk is not a foregone conclusion. There’s some uncertainty and probability around it, which makes it that much harder to understand the real risk. In the Prism program, the government surveillance program that targets internet communications like email, chats, and file transfers, the Washington Post describes how analysts use the system to “produce at least 51 percent confidence in a target’s ‘foreignness’”. This test of foreignness is tied to the idea that it’s okay (legally) to spy on foreign communications, but that it would breach FISA (the Foreign Intelligence Surveillance Act), as well as 4th amendment rights for the government to do the same to American citizens.

Platforms used by Prism, such as Google and Facebook, have denied that they give the government direct access to their servers. The New York Times reported that the system in place is more like having a locked mailbox where the platform can deposit specific data requested pursuant to a court order from the Foreign Intelligence Surveillance Court. But even if such requests are legally targeted at foreigners and have been faithfully vetted by the court, there’s still a chance that ancillary data on American citizens will be swept up by the government. “To collect on a suspected spy or foreign terrorist means, at minimum, that everyone in the suspect’s inbox or outbox is swept in,” as the Washington Post writes. And typically data is collected not just of direct contacts, but also contacts of contacts. This all means that there’s a greater risk that the government is indeed collecting data on many Americans’ personal communications.

Algorithms, and a bit of transparency on those algorithms, could go a long way to mitigating the uneasiness over domestic surveillance of personal communications that American citizens may be feeling. The basic idea is this: when collecting information on a legally identified foreign target, for every possible contact that might be swept up with the target’s data, an automated classification algorithm can be used to determine whether that contact is more likely to be “foreign” or “American”. Although the algorithm would have access to all the data, it would only output one bit of metadata for each contact: is the contact foreign or not? Only if the contact was deemed highly likely to be foreign would the details of that data be passed on to the NSA. In other words, the algorithm would automatically read your personal communications and then signal whether or not it was legal to report your data to intelligence agencies, much in the same way that Google’s algorithms monitor your email contents to determine which ads to show you without making those emails available for people at Google to read.

The FISA court implements a “minimization procedure” in order to curtail incidental data collection from people not covered in the order, though the exact process remains classified. Marc Ambinder suggests that, “the NSA automates the minimization procedures as much as it can” using a continuously updated score that assesses the likelihood that a contact is foreign.  Indeed, it seems at least plausible that the algorithm I suggest above could already be a part of the actual minimization procedure used by NSA.

The minimization process reduces the creepiness of unfettered government access to personal communications, but at the same time we still need to know how often such a procedure makes mistakes. In general there are two kinds of mistakes that such an algorithm could make, often referred to as false positives and false negatives. A false negative in this scenario would indicate that a foreign contact was categorized by the algorithm as an American. Obviously the NSA would like to avoid this type of mistake since it would lose the opportunity to snoop on a foreign terrorist. The other type of mistake, false positive, corresponds to the algorithm designating a contact as foreign even though in reality it’s American. The public would want to avoid this type of mistake because it’s an invasion of privacy and a violation of the 4th amendment. Both of these types of errors are shown in the conceptual diagram below, with the foreign target marked with an “x” at the center and ancillary targets shown as connected circles (orange is foreign, blue is American citizen).

diagram

It would be a shame to disregard such a potentially valuable tool simply because it might make mistakes from time to time. To make such a scheme work we first need to accept that the algorithm will indeed make mistakes. Luckily, such an algorithm can be tuned to make more or less of either of those mistakes. As false positives are tuned down false negatives will often increase, and vice versa. The advantage for the public would be that it could have a real debate with the government about what magnitude of mistakes is reasonable. How many Americans being labeled as foreigners and thus subject to unwarranted search and seizure is acceptable to us? None? Some? And what’s the trade-off in terms of how many would-be terrorists might slip through if we tuned the false positives down?

To begin a debate like this the government just needs to tell us how many of each type of mistake its minimization procedure makes; just two numbers. In this case, minimal transparency of an algorithm could allow for a robust public debate without betraying any particular details or secrets about individuals. In other words, we don’t particularly need to know the gory details of how such an algorithm works. We simply need to know where the government has placed the fulcrum in the tradeoff between these different types of errors. And by implementing smartly transparent surveillance maybe we can even move more towards the world of the geek squad, where big data is still ballyhooed for furthering public safety.