'How the NSA Converts Spoken Words Into Searchable Text' diff viewer (0/1)

This article is from the source 'theintercept' and was first published or seen on May 05, 2015 14:48 (UTC). It last changed over 40 days ago and won't be checked again for changes.

You can find the current article at its original source at https://firstlook.org/theintercept/2015/05/05/nsa-speech-recognition-snowden-searchable-text/

The article has changed 2 times. There is an RSS feed of changes available.

Previous version 1 Next version

Previous version 1 Next version

Version 0	Version 1
How the NSA Converts Spoken Words Into Searchable Text	How the NSA Converts Spoken Words Into Searchable Text
2015-05-17 16:48:52 UTC	2015-12-25 09:34:02 UTC (7 months later)
~~First in a series. Part 2 here.~~	First in a series. Part 2 here. Part 3 here.
Most people realize that emails and other digital communications they once considered private can now become part of their permanent record.	Most people realize that emails and other digital communications they once considered private can now become part of their permanent record.
But even as they increasingly use apps that understand what they say, most people don’t realize that the words they speak are not so private anymore, either.	But even as they increasingly use apps that understand what they say, most people don’t realize that the words they speak are not so private anymore, either.
Top-secret documents from the archive of former NSA contractor Edward Snowden show the National Security Agency can now automatically recognize the content within phone calls by creating rough transcripts and phonetic representations that can be easily searched and stored.	Top-secret documents from the archive of former NSA contractor Edward Snowden show the National Security Agency can now automatically recognize the content within phone calls by creating rough transcripts and phonetic representations that can be easily searched and stored.
The documents show NSA analysts celebrating the development of what they called “Google for Voice” nearly a decade ago.	The documents show NSA analysts celebrating the development of what they called “Google for Voice” nearly a decade ago.
Though perfect transcription of natural conversation apparently remains the Intelligence Community’s “holy grail,” the Snowden documents describe extensive use of keyword searching as well as computer programs designed to analyze and “extract” the content of voice conversations, and even use sophisticated algorithms to flag conversations of interest.	Though perfect transcription of natural conversation apparently remains the Intelligence Community’s “holy grail,” the Snowden documents describe extensive use of keyword searching as well as computer programs designed to analyze and “extract” the content of voice conversations, and even use sophisticated algorithms to flag conversations of interest.
The documents include vivid examples of the use of speech recognition in war zones like Iraq and Afghanistan, as well as in Latin America. But they leave unclear exactly how widely the spy agency uses this ability, particularly in programs that pick up considerable amounts of conversations that include people who live in or are citizens of the United States.	The documents include vivid examples of the use of speech recognition in war zones like Iraq and Afghanistan, as well as in Latin America. But they leave unclear exactly how widely the spy agency uses this ability, particularly in programs that pick up considerable amounts of conversations that include people who live in or are citizens of the United States.
Spying on international telephone calls has always been a staple of NSA surveillance, but the requirement that an actual person do the listening meant it was effectively limited to a tiny percentage of the total traffic. By leveraging advances in automated speech recognition, the NSA has entered the era of bulk listening.	Spying on international telephone calls has always been a staple of NSA surveillance, but the requirement that an actual person do the listening meant it was effectively limited to a tiny percentage of the total traffic. By leveraging advances in automated speech recognition, the NSA has entered the era of bulk listening.
And this has happened with no apparent public oversight, hearings or legislative action. Congress hasn’t shown signs of even knowing that it’s going on.	And this has happened with no apparent public oversight, hearings or legislative action. Congress hasn’t shown signs of even knowing that it’s going on.
The USA Freedom Act — the surveillance reform bill that Congress is currently debating — doesn’t address the topic at all. The bill would end an NSA program that does not collect voice content: the government’s bulk collection of domestic calling data, showing who called who and for how long.	The USA Freedom Act — the surveillance reform bill that Congress is currently debating — doesn’t address the topic at all. The bill would end an NSA program that does not collect voice content: the government’s bulk collection of domestic calling data, showing who called who and for how long.
Even if becomes law, the bill would leave in place a multitude of mechanisms exposed by Snowden that scoop up vast amounts of innocent people’s text and voice communications in the U.S. and across the globe.	Even if becomes law, the bill would leave in place a multitude of mechanisms exposed by Snowden that scoop up vast amounts of innocent people’s text and voice communications in the U.S. and across the globe.
Civil liberty experts contacted by The Intercept said the NSA’s speech-to-text capabilities are a disturbing example of the privacy invasions that are becoming possible as our analog world transitions to a digital one.	Civil liberty experts contacted by The Intercept said the NSA’s speech-to-text capabilities are a disturbing example of the privacy invasions that are becoming possible as our analog world transitions to a digital one.
“I think people don’t understand that the economics of surveillance have totally changed,” Jennifer Granick, civil liberties director at the Stanford Center for Internet and Society, told The Intercept.	“I think people don’t understand that the economics of surveillance have totally changed,” Jennifer Granick, civil liberties director at the Stanford Center for Internet and Society, told The Intercept.
“Once you have this capability, then the question is: How will it be deployed? Can you temporarily cache all American phone calls, transcribe all the phone calls, and do text searching of the content of the calls?” she said. “It may not be what they are doing right now, but they’ll be able to do it.”	“Once you have this capability, then the question is: How will it be deployed? Can you temporarily cache all American phone calls, transcribe all the phone calls, and do text searching of the content of the calls?” she said. “It may not be what they are doing right now, but they’ll be able to do it.”
And, she asked: “How would we ever know if they change the policy?”	And, she asked: “How would we ever know if they change the policy?”
Indeed, NSA officials have been secretive about their ability to convert speech to text, and how widely they use it, leaving open any number of possibilities.	Indeed, NSA officials have been secretive about their ability to convert speech to text, and how widely they use it, leaving open any number of possibilities.
That secrecy is the key, Granick said. “We don’t have any idea how many innocent people are being affected, or how many of those innocent people are also Americans.”	That secrecy is the key, Granick said. “We don’t have any idea how many innocent people are being affected, or how many of those innocent people are also Americans.”
NSA whistleblower Thomas Drake, who was trained as a voice processing crypto-linguist and worked at the agency until 2008, told The Intercept that he saw a huge push after the September 11, 2001 terror attacks to turn the massive amounts of voice communications being collected into something more useful.	NSA whistleblower Thomas Drake, who was trained as a voice processing crypto-linguist and worked at the agency until 2008, told The Intercept that he saw a huge push after the September 11, 2001 terror attacks to turn the massive amounts of voice communications being collected into something more useful.
Human listening was clearly not going to be the solution. “There weren’t enough ears,” he said.	Human listening was clearly not going to be the solution. “There weren’t enough ears,” he said.
The transcripts that emerged from the new systems weren’t perfect, he said. “But even if it’s not 100 percent, I can still get a lot more information. It’s far more accessible. I can search against it.”	The transcripts that emerged from the new systems weren’t perfect, he said. “But even if it’s not 100 percent, I can still get a lot more information. It’s far more accessible. I can search against it.”
Converting speech to text makes it easier for the NSA to see what it has collected and stored, according to Drake. “The breakthrough was being able to do it on a vast scale,” he said.	Converting speech to text makes it easier for the NSA to see what it has collected and stored, according to Drake. “The breakthrough was being able to do it on a vast scale,” he said.
The Defense Department, through its Defense Advanced Research Projects Agency (DARPA), started funding academic and commercial research into speech recognition in the early 1970s.	The Defense Department, through its Defense Advanced Research Projects Agency (DARPA), started funding academic and commercial research into speech recognition in the early 1970s.
What emerged were several systems to turn speech into text, all of which slowly but gradually improved as they were able to work with more data and at faster speeds.	What emerged were several systems to turn speech into text, all of which slowly but gradually improved as they were able to work with more data and at faster speeds.
In a brief interview, Dan Kaufman, director of DARPA’s Information Innovation Office, indicated that the government’s ability to automate transcription is still limited.	In a brief interview, Dan Kaufman, director of DARPA’s Information Innovation Office, indicated that the government’s ability to automate transcription is still limited.
Kaufman says that automated transcription of phone conversation is “super hard,” because “there’s a lot of noise on the signal” and “it’s informal as hell.”	Kaufman says that automated transcription of phone conversation is “super hard,” because “there’s a lot of noise on the signal” and “it’s informal as hell.”
“I would tell you we are not very good at that,” he said.	“I would tell you we are not very good at that,” he said.
In an ideal environment like a news broadcast, he said, “we’re getting pretty good at being able to do these types of translations.”	In an ideal environment like a news broadcast, he said, “we’re getting pretty good at being able to do these types of translations.”
A 2008 document from the Snowden archive shows that transcribing news broadcasts was already working well seven years ago, using a program called Enhanced Video Text and Audio Processing:	A 2008 document from the Snowden archive shows that transcribing news broadcasts was already working well seven years ago, using a program called Enhanced Video Text and Audio Processing:
(U//FOUO) EViTAP is a fully-automated news monitoring tool. The key feature of this Intelink-SBU-hosted tool is that it analyzes news in six languages, including Arabic, Mandarin Chinese, Russian, Spanish, English, and Farsi/Persian. “How does it work?” you may ask. It integrates Automatic Speech Recognition (ASR) which provides transcripts of the spoken audio. Next, machine translation of the ASR transcript translates the native language transcript to English. Voila! Technology is amazing.	(U//FOUO) EViTAP is a fully-automated news monitoring tool. The key feature of this Intelink-SBU-hosted tool is that it analyzes news in six languages, including Arabic, Mandarin Chinese, Russian, Spanish, English, and Farsi/Persian. “How does it work?” you may ask. It integrates Automatic Speech Recognition (ASR) which provides transcripts of the spoken audio. Next, machine translation of the ASR transcript translates the native language transcript to English. Voila! Technology is amazing.
A version of the system the NSA uses is now even available commercially.	A version of the system the NSA uses is now even available commercially.
Experts in speech recognition say that in the last decade or so, the pace of technological improvement has been explosive. As information storage became cheaper and more efficient, technology companies were able to store massive amounts of voice data on their servers, allowing them to continually update and improve the models. Enormous processors, tuned as “deep neural networks” that detect patterns like human brains do, produce much cleaner transcripts.	Experts in speech recognition say that in the last decade or so, the pace of technological improvement has been explosive. As information storage became cheaper and more efficient, technology companies were able to store massive amounts of voice data on their servers, allowing them to continually update and improve the models. Enormous processors, tuned as “deep neural networks” that detect patterns like human brains do, produce much cleaner transcripts.
And the Snowden documents show that the same kinds of leaps forward seen in commercial speech-to-text products have also been happening in secret at the NSA, fueled by the agency’s singular access to astronomical processing power and its own vast data archives.	And the Snowden documents show that the same kinds of leaps forward seen in commercial speech-to-text products have also been happening in secret at the NSA, fueled by the agency’s singular access to astronomical processing power and its own vast data archives.
In fact, the NSA has been repeatedly releasing new and improved speech recognition systems for more than a decade.	In fact, the NSA has been repeatedly releasing new and improved speech recognition systems for more than a decade.
The first-generation tool, which made keyword-searching of vast amounts of voice content possible, was rolled out in 2004 and code-named RHINEHART.	The first-generation tool, which made keyword-searching of vast amounts of voice content possible, was rolled out in 2004 and code-named RHINEHART.
“Voice word search technology allows analysts to find and prioritize intercept based on its intelligence content,” says an internal 2006 NSA memo entitled “For Media Mining, the Future Is Now!”	“Voice word search technology allows analysts to find and prioritize intercept based on its intelligence content,” says an internal 2006 NSA memo entitled “For Media Mining, the Future Is Now!”
The memo says that intelligence analysts involved in counterterrorism were able to identify terms related to bomb-making materials, like “detonator” and “hydrogen peroxide,” as well as place names like “Baghdad” or people like “Musharaf.”	The memo says that intelligence analysts involved in counterterrorism were able to identify terms related to bomb-making materials, like “detonator” and “hydrogen peroxide,” as well as place names like “Baghdad” or people like “Musharaf.”
RHINEHART was “designed to support both real-time searches, in which incoming data is automatically searched by a designated set of dictionaries, and retrospective searches, in which analysts can repeatedly search over months of past traffic,” the memo explains (emphasis in original).	RHINEHART was “designed to support both real-time searches, in which incoming data is automatically searched by a designated set of dictionaries, and retrospective searches, in which analysts can repeatedly search over months of past traffic,” the memo explains (emphasis in original).
As of 2006, RHINEHART was operating “across a wide variety of missions and languages” and was “used throughout the NSA/CSS [Central Security Service] Enterprise.”	As of 2006, RHINEHART was operating “across a wide variety of missions and languages” and was “used throughout the NSA/CSS [Central Security Service] Enterprise.”