(Updated: February 6, 2015)
Today it's exactly one year ago the Snowden-leaks started. Among the many highly classified documents which were disclosed during the past year are various charts that provide us with actual numbers about the amount of data the National Security Agency (NSA) is collecting.
Here we will take a look at those numbers and see what we can learn from them by comparing various sources and from breaking them down into NSA-divisions, countries and collection programs. As still only fragmented parts have been published, this overview cannot provide completeness or full accuracy (estimates are shown as round numbers).
The most detailed numbers about NSA's data collection are from the BOUNDLESSINFORMANT tool, which is used by NSA officials to view the metadata volumes collected from specific countries or by specific programs.
A worldwide overview is provided by a heat map which was published by The Guardian on June 11, 2013. It displays the figures over a 30-day period ending in March 2013:
NSA worldwide total:
Internet records (DNI):
Telephony records (DNR):
This total of 221 billion telephony and internet records a month equals 2,6 trillion a year and 7,3 billion a day. However, the actual number of what NSA collects worldwide might be higher - see the update below.
NSA volumes and limits
The BOUNDLESSINFORMANT tool seems to be very accurate, but there's another chart that gives different numbers. It's from a 2012 presentation for the SIGINT Development conference of the Five Eyes community and shows the volumes and limits of NSA metadata collection. The chart was published by The Washington Post on December 4, 2013 and again in Greenwald's book 'No Place To Hide' on May 13, 2014.
Chart showing the volumes and limits of NSA metadata collection
between January and June 2012
Redactions by Greenwald or the press, explanations added by the author
(click to enlarge)
This chart shows the numbers of:
- telephony metadata which are received by FASCIA, which is NSA's main ingest processor for telephony metadata;
- internet metadata that are transferred to MARINA, which is a huge NSA database that can store internet metadata for up to a year;
- internet metadata that had to be deleted because there was apparently not enough storage space.
Except for the deleted metadata, the charts shows ca. 10,4 billion internet metadata (DNI) a day, which makes 312 billion a month or 3,7 trillion a year. There are ca. 4,5 billion telephony metadata (DNR) a day, which makes 135 billion a month or 1,6 trillion a year. If we compare these numbers with those from BOUNDLESSINFORMANT, we see a big difference:
Internet metadata (DNI):
Telephony metadata (DNR):
Volumes and Limits
(a month, 1st half 2012)
(a month, 1st half 2013)
There's a difference of 11 billion telephony metadata between both charts, but an even bigger gap exists between the internet metadata: the Volumes and Limits chart shows 215 billion more than BOUNDLESSINFORMANT. This discrepancy wasn't noticed in the press reportings, nor in Greenwald's book, so at the moment there's no clear explanation for this.
A possible explanation for the discrepancies between these numbers can be found in a FAQ document for the BOUNDLESSINFORMANT tool, which says the numbers shown in the "map view" are lower than in the so-called "org view" of the tool because for the latter, also records are counted that doesn't contain the country identifiers which are needed to be counted in the "map view".
This would also explain the far bigger difference between the numbers of internet metadata, because for internet communications it is often much more difficult to attribute them to a particular country than for telephone conversations (which always contain country and region codes). This means the Volumes and Limits slide provides the more realistic numbers.
After being processed by FASCIA, the telephony metadata go to MAINWAY, which is another huge NSA database that keeps these kind of data for at least five years. In 2006 it was estimated that MAINWAY contained 1,9 trillion (1.900.000.000.000) call detail records.
For comparison: in 2007, AT&T's Daytona system, which is used to manage its call detail records (CDR's) supported 2,8 trillion records. In 2012, T-Mobile USA Inc. upgraded to an IBM Netezza 1000 platform with a capacity of 2 petabytes. This is used for loading 17 billion records a day, making 510 billion a month and more than 6 trillion a year.
If we assume the telecom providers and NSA use "records" in the same sense, than this shows that the telecommunication companies produce far more phone call metadata than NSA collects. As T-Mobile USA alone apparently creates 4 times more records as presented in NSA's BOUNDLESSINFORMANT tool, the domestic telephone metadata collection under section 215 Patriot Act cannot be included in the numbers we've seen so far.
Also interesting is that according to slides about the Hemisphere project, some 4 billion telephone metadata records are collected every day from any carrier that uses AT&T switches in response to grand jury subpoenas in counter-narcotics investigations.
During a parliamentary hearing in Germany, an official of BND explained that one cell phone creates between 100 and 200 pieces of metadata a day. For 4.5 billion cell phone users worldwide that would equal some 450 to 900 billion pieces of metadata. It's not yet clear whether NSA counts metadata in the same way, like the NSA's "records" are comprised of multiple pieces, for example.
GCHQ metadata collection
Even more metadata seem to be collected by NSA's British partner agency GCHQ, which according to this slide from 2011 collects 50 billion metadata per day. This makes 1,5 trillion a month and an astonishing 18 trillion (18.000.000.000.000) a year!
This (partial) slide was published in Greenwald's book No Place To Hide, but without any further explanation, so we don't know whether GCHQ is able to actually store everything or has to delete large amounts, like NSA. From the slide itself it seems that the number of 50 billion refers to internet metadata alone, which would make this number even more remarkable.
According to a report by The Guardian, GCHQ also collects 600 million telephony metadata a day, which makes 18 billion a month - a small number compared to the internet metadata this agency receives:
Internet metadata per month:
Telephony metadata per month:
For indexing and searching the content of internet communications, GCHQ uses the TEMPORA system, which is capable of processing the traffic from 46 fiber-optic cables of 10 gigabits per second. This makes that 21 petabytes of data flow past these systems every day.
NSA collection by country
The main BOUNDLESSINFORMANT interface with the heat map also lists the names of the countries which provide the highest numbers of data. These can be sorted in three different ways: Aggregate, DNI (internet) and DNR (telephony), each resulting in a slightly different top-5. The following aggregated totals (so both DNI and DNR) are known:
NSA worldwide total:
These numbers indicate from which countries NSA gathers most data, but the exact meaning of the numbers has still not been clarified. We do know that BOUNDLESSINFORMANT counts metadata records, but what these records exactly are (for example: how many records are created by one phone call?), and how they are attributed to a specific country is not clear.
Communications by definition have two ends: the originating and the receiving end. When both ends are in the same country, it's easy to attribute it to that particular country. But when the originating and the receiving ends are in a different country, how is such a communication registered? Maybe for both countries, although that would make many of them appear in these numbers twice.
Edward Snowden saw the heat map with the 3 billion attributed to the United States as a proof that NSA was conducting domestic surveillance, although the heat map itself cannot provide sufficient evidence for that. The 3 billion could very well relate to foreign communications which are just transiting the US or to the American end of for example phone calls where the other end is a foreign suspect. Somewhat more information could have been provided by the bar charts for the US, but these haven't been published.
The number of 3.095.553.478 for the United States is the aggregated total. The number of internet records (DNI) for the US is 2.892.343.446, which leaves just 203.210.032 telephony records (DNR) or 0,065% of the aggregated total. In a table this looks like this:
United States total:
Internet records (DNI):
Telephony records (DNR):
3.095.553.478 per month
2.892.343.446 per month
203.190.032 per month
This tiny share for telephone metadata is rather strange given the fact that NSA is collecting all American phone records, but does not so with internet metadata. This seems to indicate that these domestic phone records are not counted by BOUNDLESSINFORMANT and that the internet records are from communications with at least one end foreign.
NSA collection by division
With a BOUNDLESSINFORMANT chart about the NSA's Special Source Operations (SSO) division published in Greenwald's book, we can also compare the number of data collected by this division with the total number of NSA data collection. We see that SSO, which is responsible for tapping the world's main fiber optic cables, accounts for 72% of all data:
NSA worldwide total:
Special Source Operations (SSO):
Other NSA divisions:
This leaves the remaining 28% of the data to be collected by NSA's other main divisions: Global Access Operations (GAO), which operates mobile collection platforms like satellites, planes, drones and ships, and Tailored Access Operations (TAO), which collects data by hacking into foreign computer networks. The remaining 28% could also encompass data collected by the joint NSA/CIA Special Collection Service (SCS) units and by 3rd Party partner agencies.
SSO Collection programs
From the BOUNDLESSINFORMANT chart about Special Source Operations we can see how the total number of data collected by this division breaks down into the 5 biggest collection programs. From other charts we also know the numbers collected by some other programs, and these are added here too:
SSO worldwide total:
SPINNERET (US-3180, part of RAMPART-A):
MOONLIGHTPATH (US-3145, part of RAMPART-A):
INCENSER (DS-300, part of WINDSTOP):
AZUREPHOENIX (US-3127, part of RAMPART-A):
SOMALGET (US-3310, part of MYSTIC):
ACIDWASH (part of MYSTIC):
MUSCULAR (DS-200B, part of WINDSTOP):
Other programs in total:
This listing shows that roughly one third of the data from telecommunication cables are collected by just on single program: DANCINGOASIS. Another third part is intercepted by the programs ranking second, third and fourth,
On June 18, 2014 the Danish newspaper Information and Greenwald's website The Intercept broke a story saying that SPINNERET, MOONLIGHTPATH and AZUREPHOENIX are all part of the RAMPART-A program, which encompasses access to fiber-optic cables abroad, in cooperation with 3rd Party partner agencies from at least five different countries.
According to a FAQ document, the BOUNDLESSINFORMANT tool doesn't count data which are collected under FISA authority, so numbers about the famous PRISM program are excluded. However, another source (pdf) says that under PRISM, more than 227 million "internet communications" are collected annually, which is ca. 19 million a month, but it is not known whether these "internet communications" are the same kind of records as presented by BOUNDLESSINFORMANT.
Processing and storing
Metadata from a number of big and important SSO collection programs are processed by a system codenamed SHELLTRUMPET. As can be read in the document below, this system processed almost 500 billion metadata records in 2012, which gives an average of 41,6 billion a month, but by the end of 2012 SHELLTRUMPET was already processing 2 billion call detail records a day, which would make 60 billion a month:
MUSCULAR contributes 60 gigabyte of data to the PINWALE database for internet content every day, which is 1,8 terabyte a month. As BOUNDLESSINFORMANT counts 181 million records for MUSCULAR, this would mean that 1 million internet metadata records represent almost 10 gigabyte of (content) data.
This correlation can be used to make a very rough estimate of the total amount of internet data collected by NSA. The worldwide total of 97 billion internet records a month would then equal some 961 terabyte of data each month or 11,5 petabyte a year (some numbers to compare are here; the new NSA data center in Bluffdale, Utah can store an estimated 12 exabytes, which is 12.000 petabytes).
Shared by 2nd party partner agencies
The very close working relationship between NSA and the 2party partner agencies from the Five Eyes community leads to a regular exchange of data, of which the most productive facilities can be seen in a BOUNDLESSINFORMANT chart that was published by Der Spiegel:
The SIGAD codes starting with DS denote some kind of joint collection program, those starting with UKC stand for civilian operated facilities of the British signals intelligence agency GCHQ.
Shared by 3rd party partner agencies
NSA also gets data provided by 3rd Party partner agencies. These are counted by the BOUNDLESSINFORMANT tool too, as we know from charts about a number of European countries:
The Netherlands (US-985Y):
The total number of data received from these nine countries is slightly more than 1 billion a month, which is just a tiny 0,0045% of NSA's overall collection as counted by the BOUNDLESSINFORMANT tool.
Initially, Glenn Greenwald reported in various European newspapers that these numbers represented the phone calls of European citizens intercepted by NSA. But gradually it came out that his interpretation was wrong.
The charts actually show numbers of metadata that were collected from foreign communications by European military intelligence agencies in support of military operations abroad. These data were subsequently shared with partner agencies, most likely through the SIGDASYS system of the SIGINT Seniors Europe (SSEUR) group, which is led by NSA.
> See also: NSA's foreign partnerships
Links and Sources
- Syncsort.com: How Hadoop is Transforming Telecom
- Secret-bases.co.uk: Secret Data Centres, including GCHQ's Tempora and NSA's PRISM projects
- Cryptome.org: Numbers of reports generated by various NSA programs (pdf)
- Forbes.com: Blueprints Of NSA's Ridiculously Expensive Data Center In Utah Suggest It Holds Less Info Than Thought