Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"At least 80% of fibre-optic cables globally go via the US. This is no accident and allows the US to view all communication coming in. At least 80% of all audio calls, not just metadata, are recorded and stored in the US. The NSA lies about what it stores."

I think he meant that the NSA had access to the 80% of calls that are routed through the US, not the the NSA is recording and storing literally every single one of them. I think he was misquoted or misspoke.

William Binney hasn't worked for the NSA since 2001. Were they recording all calls back then? Did someone still there leak new information to him?



The NSA is recording every call in the Bahamas and Afghanistan. They can listen to any call made in the last 30 days. (Edit: Only cell phone calls, but still.) It's possible that a similar rolling-window system is in place for calls going through the USA.

Bahamas: https://firstlook.org/theintercept/article/2014/05/19/data-p...

Afghanistan: https://wikileaks.org/WikiLeaks-statement-on-the-mass.html


Neither of those two places could be described as having a particularly high volume of calls though. Its an order of magnitude or more difference in sheer volume you'd be looking at.


Well, what would it take to record the whole US? One version says that we make ~5 calls a day on our phones [1] and ownership hovers at 80-90% of the 18+ population (and probably a bunch younger). The US is 311 million, lets say 80% of 300 million, so 240 million people making 5 calls / day.

The average call length is 1.8 minutes right now [2]. So we've got 240 mil person * 5 calls / person-day * 365 days * 1.8 min / call. So, about 788 billion minutes of calls / year at a flow rate of about 2 billion calls / day.

At MP3 compression of 128kbps, vocal data takes about 0.94 MB / minute [3]. So, close to 688 petabytes of storage data at a rate of about 1.9 petabytes / day. Seems within the realm of doable.

The problem of analysing this is in the "ridiculous parallelism" category, so they'd just be constrained by server farm capacity. Lets say they had a system with a conservative million nodes. Each day, each node would have to process ~2 GB of audio data looking for patterns. Not even challenging. If I were clever, I'd probably run a brute force audio to text on each node, then a text to symbol pattern analyser. I'd also have higher level net processes that look for patterns in calls spatially and temporally, but with far less processors.

[1] http://www.pewinternet.org/2010/09/02/cell-phones-and-americ...

[2] http://www.statista.com/statistics/185828/average-local-mobi...

[3] http://iaudiophile.net/forums/archive/index.php/t-2081.html


Standard POTS audio is a single 3.4kHz bandwidth channel. Compressing VOIP codecs allow a wider range, but still sound great at 12kb/s.

So without any non-COTS tech and without sacrificing the lilt of grandma's sparkly voice, we're already at a 90% reduction of your numbers.

So let's go with 190 TB/day. Say we keep 30 days in hot disk storage, and spool it off to a digital tape robot afterward. We'd need a few thousand COTS hard drives on less than a thousand servers (triple it for decent RAID).

Then a few million-dollar LTO tape libraries, and a couple of guys to schlep tape cartridges all day long.

(Of course, since we don't require all that compute power, and because Gov't doesn't build datacenters like Google does, they'd just buy a few fat IBM z/OS boxes, and a few fat EMC cabinets for low $10's of millions total, and call it a day.)

Either way, it's well within the realm of eminently achievable, I'd say. Which means it's obviously happening. :)


Thanks for the correction on the compression that's possible (and others who offered comprehensible work in the 1 kbps to 30 kbps range). I didn't know the SOTA so I went for what I as a consumer thought of for audio compression.

On the final point, agree. This is in the range where its cost is round off error in some of the large security budgets. (+ probably far more for the labor, ops, ect...)

That was one of the main reasons I worked through it, because it sounds like a horrifically large task, but its really not even that crazy in terms of data rates and storage when you break it down.


And you can divide by two (person A calling to person B is just one conversation needing to be recorded, instead of two recordings for person A and person B), and you're at less than 100 TB per day. That's pocket money.


Your compression numbers are way off. 128kbps is for stereo music, not monaural voice recordings. If you stick with MP3, you can go down to 32 or 24 kbps without sacrificing any intelligibility. If you use a real voice codec like G.729, you can easily get by with 8 kbps. https://en.wikipedia.org/wiki/G.729


Here, have a data point: GCHQ had a sub-1kbps voice codec in the mid-80s (it sounds horribly 'squelchy'), and they used tape when they were doing it back then...


You can compress human speech while maintaining intelligibility much tighter than 128 kbps.

And did you seriously make a link to cite that 128 kb * 60 = 0.94 MB?


No, he made a link to cite that 128 kb * 60 / 8 = 0.94 MB


128 kb = 16 kB.


This is cool. Everyone always talks about the idea of recording and analyzing calls, but I've never thought about actual implementation. I appreciate the perspective.


It's got to be two orders of magnitude easier for them to tap phones in the USA though. They don't have to ingratiate themselves with anyone or hide what they're doing, just show up at AT&T with a FISA-signed warrant and install equipment. https://en.wikipedia.org/wiki/Room_641A They've had a lot longer to do the work, and they have a lot less distance to transport equipment.


The call in Afghanistan is just satellite calls - those satellites are all US owned. Tapping them is trivial given the very limited number of satellites.

US telecommunications is geographically disperse, and my original point anyway was that the volume of calls is enormous - storing and processing that much data is hardly a trivial problem since it's both storage AND CPU intensive. From the US black ops budget you could make some reasonable estimates as to the total size and capability of the NSA to do this.


No. Mobile and landlines, too.

Satellite calls have been intercepted worldwide for decades now by numerous intelligence agencies.


"The National Security Agency has been recording and storing nearly all the domestic (and international) phone calls from two or more target countries as of 2013." That seems like more than just satellite calls.


And that's where the Utah data center comes in.


"At least 80% of all audio calls, not just metadata, are recorded and stored in the US."

It definitely seems like he meant that 80% of calls are recorded and stored. Whether that's true or not will (hopefully) come out in time.


Sure, if you read just that sentence. If you read just the one before it, it definitely seems like he meant the US has visibility into 80% of calls.


the techy in me is greatly impressed that can do this.

the libertarian in me is greatly annoyed they can do this.


It's actually not really something impressive, brewster did some napkin calculation that estimates a cost of about 30 million USD per year. That's nothing. https://blog.archive.org/2013/06/15/cost-to-store-all-us-pho...


Please don't politicize civil liberties. If progressives and libertarians can come together on anything, it's this.


Maybe Shivetya doesn't have any progressive in them. If they did, perhaps it would be annoyed too.


“All cats are libertarians. Completely dependent on others but fully convinced of their own independence.”


Good joke, but sort of backfires!

Aren't cats notoriously good at surviving in the wild? Both feral and domestic cats (if they have to).


Not around here. Cats are great meals for coyotes, foxes, birds of prey, feral hogs... The list goes on and on. But don't tell a cat that and burst their bubble!


I have no idea what the NSA stores, but it's certainly feasible to store all the audio calls if they want to, and the new data center they're building has far more capacity than they would need:

http://blog.archive.org/2013/06/15/cost-to-store-all-us-phon...


Very probably yes. GCHQ's retention goes back (very broadly) that far (it's inconsistent) - and they have some access to each other's networks (the NSA access GCHQ's network via example.gchq.nsa.ic.gov, specifically, so for example https://wiki.gchq.nsa.ic.gov/index.php would get you to GCHQ's internal MediaWiki if you're on the NSA network - GCHQ use an internal .gchq TLD). They can query selectors on each other's databases, and filtering of what's "not allowed" seems to be very often up to the analyst. (Of course, I don't know if they share everything. Probably not.)

The NSA's own BLARNEY initiative for this stuff dates back to 1978, although I think GCHQ beat them to the punch in effective mass telephony intercept - the first one I'm aware of started in the early 1980s, although that is offline, and probably no longer archived as the media (digital tape with low-bitrate encoded audio) would probably have degraded beyond all usefulness by now (and it sounded pretty crunchy in the first place, I understand, kinda like a bad squelchy lower-side-band transmission!).

Remember, however, that they didn't have the same kind of analysis capability back then that they've got these days (but they can probably go back and analyse old stuff they still retain). As Snowden's disclosed, you're seeing the newer systems having full-take ring buffers in nearline storage, followed by offline selection for recent access and analysis from that using a huge amount of distributed processing, and in turn automatic selected archiving out of that. It's pretty much the difference between microfiche and Google in effectiveness.


Back then the key words were "echelon" and "carnivore". People had suspicions, IIRC there was an EU inquiry, but only now do we have concrete evidence.


I don't think there is any doubt about the fact that NSA wants to store it all. Just look at the short interview/documentary with Binney that was linked from the article: https://www.youtube.com/watch?v=590cy1biewc


No: the NSA is recording at least 80% of all calls and will, towards the end of the year, have the capacity to record 100%.


You're assuming that the 20% drop off is due to capacity. That isn't my understanding of the current situation.

The Utah facility gives them the ability to store calls for longer, it doesn't increase their scope.

The reason they don't record 20% of calls is, from my understanding, that they have internal rules against recording certain people important to the democratic process within the US (e.g. politicians (and their families), judges, election officials, etc).

Essentially the politicians were concerned that the NSA's resources would be utilised by whichever party was in office to gain an advantage, so they put themselves (both parties) onto a "no spy" list. But you won't find the law where it says that, because it is all internal-rules created by committee behind closed doors.


But wouldn't it be nice if they could use their data to get rid of some corruption in politics? I know, I know, how does the government as a whole police itself? Very tough problem. But at some level I don't really care that they collect data, it's what they do with it that's important. So who decides? What is the master agenda? How do they self police? etc... One can only guess at the answers to these questions. It may be nice to allow anyone in the NSA to spy on anyone else in the NSA to prevent abuse, but that would be a huge problem if a spy managed to get a job there. I just imagine this being a rather complex problem.


They are not allowed to spy on US persons without a warrant. Full stop. The law is quite clear. Further, the court ruled that metadata was exempt from protection because, in part, it was voluntarily handed over to a third party by the end user.

I'd love to hear how you came to your conclusion that only politicians, et. al. enjoy such protections.



Read your links, but found nothing contradictory with what I said... and I certainly found nothing to substantiate the claims of the post I replied to.

What exactly used to be true that is no longer true?


"They are not allowed to spy on US persons without a warrant. Full stop."

That's what you said that is no longer true.

"The FISA Court (FISC) today released a heavily redacted version of its July ruling approving the renewal of the bulk metadata collection on all phone calls from US phone providers under Section 215 of the Patriot Act. "

Then there is the "3 hops" rule that allows full spying of individuals if you are "3 hops" from a suspected terrorist, which is most people alive.

I'm not saying whether or not I support any of this, but I do acknowledge that the NSA was not allowed to spy on US citizens before the patriot act and everything we know indicates that they strictly followed this rule, but things have changed.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: