February 13, 2016

How NSA contact chaining combines domestic and foreign phone records


In the previous posting we saw that the domestic telephone records, which NSA collected under authority of Section 215 of the USA PATRIOT Act (internally referred to as BR-FISA), were stored in the centralized contact chaining system MAINWAY, which also contains all kinds of metadata collected overseas.

Here we will take a step-by-step look at what NSA analysts do with these data in order to find yet unknown conspirators of foreign terrorist organisations.

It becomes clear that the initial contact chaining is followed by various analysis methods, and that the domestic metadata are largely integrated with the foreign ones, something NSA never talked about and which only very few observers noticed.

What is described here is the situation until the end of 2015. The current practice under the USA FREEDOM Act differs in various ways. The information in this article is almost completely derived from documents declassified by the US government, but these have various parts redacted.


 

RAS-approval

As a seed for starting a contact chain, NSA analysts can take a telephone identifier like a phone number (also called a selector), based upon:
- their own ongoing analysis on an existing target set;
- a Request for Information (RFI) from another government agency;
- a notification of a match between a known counterterrorism-related selector and an identifier among newly ingested phone metadata.

Access to the domestic phone records was granted to about 125 intelligence analysts from the Homeland Security Analysis Center (HSAC, or S2I4) of the NSA's Signals Intelligence Directorate. There were also up to 22 specially trained officials called Homeland Mission Coordinators or HMCs (initially shift coordinators).

As required by the FISA Court orders, only these HMCs, the chief and the deputy chief of the HSAC are allowed to determine that there is a Reasonable, Articulable Suspicion (RAS) that a certain selector is associated with a designated foreign terrorism group and/or Iran. Such a RAS-approval is only needed for the domestic phone records, not the ones collected overseas.

NSA has a special RAS Identifier Management System to streamline the adjudication of the requests for RAS approval and the documentation thereof. The codename of this system is IRONMAN, as we learn from this document from a declassified 2011 training presentation (.pdf) in which this codeword wasn't redacted twice:



A RAS-approval is effective for one year, meaning that during the next year, repeated queries using the approved seed selector can be made. If the selector is reasonably believed to be used by a US person, the approval period is 6 months.

The number of RAS-approved identifiers varied substantially over the years, but in 2012, there were fewer than 300. According to the annual Transparancy Report from the Director of National Intelligence (DNI), there were 423 such selectors in 2013, but just 161 in 2014. It's not known how many of these belonged to Americans.
 


Different kinds of queries

From various declassified documents analysed in an article on the weblog EmptyWheel, it becomes clear that there are three different kinds of queries that NSA analysts conducted on the domestic phone records database:
1. Queries for data integrity purposes
2. Queries for "Ident lookups"
3. Queries for contact chaining

In the EmptyWheel article it's assumed that besides these queries, NSA also conducted some kind of pattern analysis: in many declassified documents a redaction appears right after the term "contact chaining", which according to EmptyWheel could hide something like "pattern analysis".

Given that in these documents the targets are also redacted, there's also the possibility that the redaction hides a description of the target, like "contact chaining al-Qaida affiliates".

At least one NSA memorandum from 2009 indeed speaks about "chaining and analysis", but there can be two kinds of analysis: one conducted on the bulk of raw metadata records, and another one on selected results of contact chaining.

NSA always denied that it conducts pattern analysis on the bulk metadata themselves, stating that every search begins with a specific telephone number or other specific selection term. So far, there are no indications of the contrary, so the analysis apparently refers to the results of contact chaining queries, which is confirmed by the 2014 report (.pdf) about the Section 215 program by the Privacy and Civil Liberties Oversight Board (PCLOB).

As we will see later on, this second type of analysis is indispensable for making the contact chaining queries useful for foreign intelligence purposes.




(1) Data integrity queries

The first way the domestic phone records were queried was for data integrity purposes. This was done by some 25 specialized Data Integrity Analysts (DIAs). They didn't conduct target analysis, but helped intelligence analysts with questions on a target. For those cases, a DIA could use a standard login (with appropriate controls) to query the phone records for foreign intelligence purposes.

However, when they queried for data integrity purposes, DIAs used a special login that bypassed the normal controls (like EAR) and also the auditing. This because for this task, they were allowed to use identifiers that were not RAS-approved (not allowed though were selectors that had expired because they were not revalidated).

One goal of these data integrity queries was to discover selectors that, for reasons that were redacted in the review report, should not become part of analysis, both for BR FISA and other purposes. These selectors could then be added to a defeat list of identifiers that were deemed to be of little analytic value, and/or to a database holding those that should not be tasked onto the collection system.

There was of course a risk of mixing up these tasks, and after an expired identifier had been queried in March 2010, the NSA Inspector General recommended that the duties of DIAs and foreign intelligence analysts should be clearly separated.


(2) Ident lookup queries

A second kind of query was for so-called "ident lookup". According to an NSA Inspector General test report (.pdf) from April 2010, this refers to:
"querying a selector using [tool name redacted] to determine the approval status of a selector. In such cases, the Emphatic Access Restriction controls will prevent chaining of a selector that is not marked as approved for querying, and return an error message to the analyst. Because the selector was not actually chained, there is no violation of the Order"

Emphatic Access Restriction (EAR, pronounced as "ear") is a tool that was installed at the MAINWAY database in February 2009. It automatically prevents using a selector that is not RAS-approved. It seems therefore that when an analyst started a query and the seed selector appeared to be not approved, that query was called an "ident lookup" (although EmptyWheel has a different interpretation).

This could be the way it worked before the IRONMAN system was established, as in a training module from 2011, it is said that by then, analysts just had to "use [tool name redacted] to determine the identifier’s approval status".
 


(3) Contact chaining queries

The most important queries on the domestic phone records were of course those conducted by intelligence analysts in order to "identify unknown terrorist operatives through their contacts with known suspects, discover links between known suspects, and monitor the pattern of communications among suspects".

For this, an analyst took a RAS-approved selector (often a telephone number) and entered it into a specialized metadata tool, which searched the telephone metadata in the MAINWAY contact chaining system. To limit the number of results, the analyst could set a certain timeframe for the query.

The metadata tool then returns "a .cml file, usually referred to as a chain, which is made up of the individual first hop contacts of the seed". Usually, the analyst will also be interested in the second-hop contacts, and then the tool will retrieve the batches of one-hop chains for the identifiers that had been in direct contact with those from the first hop series.



Number of hops

Based upon the FISA Court orders, NSA analysts were also allowed to retrieve the numbers in contact with all the numbers from the second hop, which would make a third hop. The software tools are said to prevent looking beyond the third hop, or performing a query of a selection term that has not been RAS-approved.

The initial authorizations under the President's Surveillance Program (PSP) did not prohibit chaining more than two degrees of separation from the target, but "NSA analysts determined that it was not analytically useful to do so".* When this collection was brought under supervision of the FISA Court, it limited contact chaining to 3 hops.

But despite that authorization, the policy of NSA's Counter Terrorism branch restricted chaining to 2 hops, as can be seen in an NSA training presentation (.pdf) from 2007:


A 2011 training module says that chaining to a third hop is possible, but only after prior approval by the analyst's division management (for example when a contact that comes up with the first hop appears to be an already known suspect).

Strangely enough, both a government white paper and the PCLOB-report don't mention this policy restriction and in the latter it's even assumed that chaining 3 hops was regular practice:
"If a seed number has seventy-five direct contacts, for instance, and each of these first-hop contact has seventy-five new contacts of its own, then each query would provide the government with the complete calling records of 5,625 telephone numbers. And if each of those second-hop numbers has seventy-five new contacts of its own, a single query would result in a batch of calling records involving over 420,000 telephone numbers"

As of 2012, the FISA Court also allowed an automated chaining process, but NSA was never able to get that working (although the PCLOB report, again, describes it as if it was actually implemented).


Visualization

The results from a contact chaining query can be visualized by a contact graph. An example was published by the German magazine Der Spiegel, showing a slide from an NSA presentation with a 2-hop contact graph for the e-mail addresses of the CEO and the chairwoman of the Chinese telecommunications company Huawei:




Domestic and foreign results

Generally, it is said that analysts query the "Section 215 calling records", the "BR metadata" or something similar. This sounds like they only access the domestic telephone records and that therefore the resulting contact chains would fully consist of American phone numbers.

The initial seed number however will often be a foreign number, as the whole purpose of the Section 215 program is to discover connections between foreign terrorists and potential conspirators inside the US. Analysts will therefore choose a seed for which they expect a good chance it has a domestic nexus, which probably explains the low numbers of RAS-approved identifiers.

But as we have seen in the previous article, NSA stored the domestic phone records in MAINWAY, which also contains the foreign telephone and internet metadata collected overseas. That means that a contact chaining query will not only return identifiers from the domestic, but also from the NSA's worldwide metadata collection.


Federated queries

Such results from multiple sources are called federated queries. According to a 2011 training module, BR FISA queries initially only resulted in these federated queries, but in later versions of the query tool, the analyst could also check boxes to conduct an "unfederated" query and choose individual collection sources.

These options can be seen in the following screenshot from the user interface (the codename of which is redacted) used to conduct the contact chaining:


Selecting the "FISABR Mode" makes that an additional checkbox for the EO12333 source appears. An NSA memorandum explains that when this BR FISA option is chosen, the analyst will not only be provided with the domestic telephone metadata, but also with those from the SIGINT realm (which is collection overseas under EO 12333 authority), dating back to late 1998.

When the analyst used a RAS-approved selector, he could also check the box for PENREGISTRY, or PR/TT, which refers to the domestic internet metadata, but the collection thereof was ended by the end of 2011. Normal mode is for all other metadata collected abroad.
Analysts can determine the collection sources of each result by examining the Producer Designator Digraph (PDDG) and/or SIGINT Activity Designator (SIGAD) from each line of the contact chain file. BR FISA metadata can be identified by specific SIGADs.

SPCMA

There's also a fourth box for SPCMA mode, which stands for the "Special Procedures governing Communications Metadata Analysis" from January 2011. These allow contact chaining and other types of analysis on metadata that have already been collected under EO 12333, regardless of nationality and location (because metadata aren't constitutionally protected).

This means that US person identifiers that were in contact with valid foreign intelligence targets may be used for searching these foreign metadata too. NSA isn't allowed to collect US data overseas, but these do come in "incidentally" when for example foreigners communicate with Americans - precisely the kind of communications that could reveal conspirators inside the US.


In other words:
- By default, any contact chaining query will use the foreign metadata collected overseas. For these, any useful selector may be used as a seed, and, under SPCMA, even one that belongs to an American.

- If the seed selector is RAS-approved, then the domestic phone records will be used too, which could lead to the discovery of additional contacts within in the US.

The fact that most contact chains will consist of both foreign and domestic identifiers means that they contain much less American numbers then in calculations like the one from PCLOB, which give the impression that queries resulted in up to 3 hops of domestic numbers.


 


Analysing the contact chains

It should be noted that the phone numbers (or other selectors) which are returned after an initial contact chaining query are anonymous and therefore meaningless. They're just numbers which could belong to anyone: from a pizza delivery to a dangerous conspirator.

So, in order to identify which numbers are of interest for finding unknown suspects, additional analysis is needed - a comprehensive GCHQ book (.pdf) disclosed last week calls contact chaining the start of a "painstaking process of assembling information about a terrorist cell or network".


Analytic tools

In the early years of the President's Surveillance Program (PSP), only the SIGINT Navigator (SIGNAV) tool was available to view the output of the MAINWAY contact chaining system. Later, new tools were created to improve efficiency and to obtain the most complete results, they were designed to use phone records collected both domestically and overseas.

According to the 2009 BR FISA review, there were 19 different analytic tools used for analysing both the raw metadata as well as the results of contact chaining. The glossary of the review lists following tools, unfortunately with their codenames redacted:


S................?
"This tool is used by HMCs to conduct contact chaining against BR FISA metadata and provide the results to the [...]team. HMCs only used RAS-approced selectors when using this tool. The [...] team ultimately provided the results to NSA's [....]"

S.........?
"The primary desktop graphical user interface (GUI) for access to [....] data and services"

S....?
"An analytic query tool used to seek out additional information on telephony selectors from [MAINWAY?] and other knowledge bases and reporting repositories"

[SYNAPSE Workbench?]
"A next generation metadata analysis graphical user interface (GUI) which is the replacement for [......]"

W......?
"The query tool, which indicates whether a telephony selector is present in NSA data repositories, the total number of unique contacts, total number of calls, and "first heard" and "last heard" information for the selector"


The 2009 PR/TT review also mentions the following tool, which could have been redacted in the BR FISA review:

M.....?
"A database analytic system and user interface tool for integrated analysis of multiple types of metadata, facilitating more comprehensive target activity tracking"




Combining multiple contact chains

In 2006, a "high-level Bush Administration intelligence official" told Seymour Hersh that analysts could for example look whether any number that is two or three hops away from the seed number is also in direct contact with that original suspect number. That sounds smart, but in that case, that number two or three hops away is simply a first-hop contact.

Finding suspects just by looking at connections between anonymous numbers could work however when several contact chains (from related suspect seed numbers for example) are combined: then a number that appears to be in contact with seed #1 and also with seed #2, would be suspicious, as it apparently belongs to someone known by both initial suspects.

This approach was seen in the CBS television program 60 Minutes from December 15, 2013, in which an NSA employee gave a demonstration of how metadata contact chaining works. He used a tool for foreign collection under EO 12333, resulting in some contact chains of almost fully masked phone numbers from Somalia. Clearly visible are numbers that different targets had in common:



Detailed call record analysis

Besides analysing the breadth of the contact chains, each contact between two phone numbers can also be analysed in depth. For this, the analytic software provides analysts access to the complete calling records associated with all the phone calls from a contact chain.

Such a record, as provided by the telecoms, includes the calling and the called number, a calling-card number, the IMEI number of a mobile handset and the IMSI number of a SIM card, as well as the date and time of the call, its duration and technical information about how the call was routed through the telephone networks.

This provides analysts with information like which number initiated the call, the day and time the call was made, and how long it lasted. And although the domestic phone records may not contain cell phone location data, the area code and prefix of a landline telephone number, as well as the trunk identifier for mobile networks, still indicate the area where a particular phone was located.

As described in the previous article, these data weren't derived from the MAINWAY system, but from a second database which holds "individual BR FISA metadata call records for access by authorized Homeland Security Analysis Center (HSAC) and data integrity analysts to view detailed information about specific telephony calling events".


Searching the second database

This database of calling records also enables analysts to subject these records "to other analytic methods or techniques besides querying", like for example searching them "using numbers, words, or symbols that uniquely identify a particular caller or device", or using "selection terms that are not uniquely associated with any particular caller or device" - according to the PCLOB report.

So, when analysing one or more contact chains resulted in finding several suspicious phone numbers, analysts can then use those numbers for querying the second database in order to see whether these numbers also appear in phone records that were not included in their initial contact chains.

And it also seems possible to query for example a trunk identifier to discover other phones from the same region. These kind of searches can therefore provide potential connections that could not have been found by conducting a direct contact chaining query.


Some numbers

In a Department of Justice report (.pdf) from 2006 it's said that NSA "estimated that only a tiny fraction (0,000025% or one in four million) of the call-detail records [...] were expected to be analyzed". This would mean that of the 1,8 billion domestic phone records provided daily by AT&T, just 450 would be used for analysis.

So in a year, the records (not the content) of roughly 230.000 individual calls from the domestic metadata collection could have been used for analysis in addition to contact chaining.



Foreign call records

As we have seen, a contact chaining query on Section 215 telephone metadata will generally result in both foreign and domestic numbers. Analysts will therefore not only like to analyze the associated call records from the domestic collection, but also those from foreign collection conducted abroad.

These foreign phone records could be retrieved from the known metadata repositories like ASSOCIATION (for mobile calls) and BANYAN (for landline calls), or from a single foreign "SIGINT" database, as is suggested by an NSA memorandum from 2009.


Enrichment

Analyzing the detailed call records will still not provide names or other information that allows the identification of the people to which the numbers from a contact chain belong. For that, the phone numbers have to be correlated ("enriched") with other kinds of information.

The easiest way is probably to combine them with target watch lists to see if the contact chains contain phone numbers that belong to already known targets. This is demonstrated in the following video, which shows contact chain analysis using Sentinel Visualizer, which is a commercially available program for this purpose:





Telephone identifiers found through contact chaining and subsequent analysis can of course also be correlated with internet metadata. NSA does not collect domestic internet metadata anymore, but its collection abroad results in over 10 billion internet metadata a day being stored in the MARINA database.

The metadata from contact chains can also be enriched with data from for example GPS and TomTom, billing records and bank transactions, passenger manifests, voter registration rolls, property records and unspecified tax data - for both Americans and foreigners, according to a New York Times report, but in which NSA denies using this for the domestic metadata collected under Section 215.


SYNAPSE Data Model

With all this, analysts can build extensive social network graphs (or "community of interest" profiles) using 164 different relationship types like "travelsWith, hasFather, sentForumMessage, employs". It seems that this refers to the SYNAPSE Data Model, for which internal NSA relationships are shown in the following diagram that was published by The New York Times too:



Apparently also based upon this data model is SYNAPSE Workbench, which seems to be the "next generation metadata analysis graphical user interface (GUI)" described in the 2009 BR FISA review. SYNAPSE Workbench is apparently capable of fusing metadata from multiple sources and is also enabled for SPCMA searches.


Further action

When all this makes an analyst to believe that a certain telephone identifier belongs to someone who is of interest but wasn't yet known or identified, the following actions can be taken:
Is the identifier American and of counterterrorism value, then it can be passed on to the FBI for further intelligence or criminal investigation. From 2006-2009, NSA provided the FBI (and other intelligence agencies) a total of 277 reports containing 2883 telephone identifiers.
Is the identifier foreign, then NSA can use it as a selector to retrieve the content of associated communications that might be already in its databases. It can also be entered into the NSA collection system in order to pull in the content of any future communications of the target systematically.

In case the identifier of the yet unknown suspect is foreign, the analyst might have found out a name through the various enrichment correlations, but if not, this can also be achieved by listening into the content of associated phone calls or additional Human Intelligence (HUMINT) methods.


 

Conclusion

As we have seen, the domestic phone records collected by NSA under Section 215 are used for contact chaining that combines both domestic and foreign identifiers. NSA never explicitly explained this, probably because they didn't want to draw attention to their foreign metadata collection and analysis efforts. But it did became clear from the many documents about the Section 215 program that were declassified by the US government.

According to these documents contact chaining for finding yet unknown conspirators isn't as easy as it may appear. It's not that one enters a phone numbers and the software provides a list of suspects. Data retrieved through the contact chains have to be analysed and correlated with other data sets in order to find out which numbers could matter. It still depends on experience, analysis and eventually even guessing which data and which numbers might be worth a closer investigation.

How successful this contact chaining and subsequent analysis is, is difficult to say. The report of the Privacy and Civil Liberties Oversight Board judged that there was "no instance in which the [Section 215] program directly contributed to the discovery of a previously unknown terrorist plot or the disruption of a terrorist attack" - but it's also possible that there were just no such conspirators.

The PCLOB report noticed that analysing the domestic telephone metadata did provide some value "by offering additional leads regarding the contacts of terrorism suspects already known to investigators, and by demonstrating that foreign terrorist plots do not have a U.S. nexus" - although useful, this seems a rather meager result of what for sure required lots of work.



Links and Sources
- EmptyWheel.net: Federated Queries and EO 12333 FISC Workaround - What We Know about the Section 215 Phone Dragnet and Location Data
- PCLOB: Report on the Telephone Records Program Conducted under Section 215 of the USA PATRIOT Act (pdf) (2014)
- Cryptome.org: NSA FISA Business Records Offer a Lot to Learn (2013)
- Huffingtonpost.com: The NSA's Telephone Meta-data Program: Part I (2013)
- US Administration White Paper: Bulk Collection of Telephony Metadata under Section 215 of the USA PATRIOT Act (pdf) (2013)
- The New Yorker: What the N.S.A. Wants to Know About Your Phone Calls (2013)
- NSA: Business Records FISA NSA Review (.pdf) (2009)

January 20, 2016

Section 215 bulk telephone records and the MAINWAY database

(Updated: February 3, 2016)

One of the most controversial NSA programs was the bulk collection of domestic telepone records (metadata) under authority of Section 215 of the USA PATRIOT Act.

The Snowden revelations provided hardly any information about this program, but many details became available from documents that were declassified by the US Director of National Intelligence (DNI).

Because in these declassified documents all codenames are redacted, it was largely a mystery which NSA systems were used to store and analyse these metadata.

By combining many separate pieces from both the Snowden-documents, as well as those declassified by the government, it now has become clear that NSA put the domestic phone records in its central contact chaining system MAINWAY, which also contains all sorts of metadata collected overseas.



Reconstruction of the MAINWAY dataflow
(Click to enlarge)



MAINWAY versus MARINA

Initially it was thought that MAINWAY was a repository just for telephone metadata. This goes back to a report by USA Today from May 10, 2006, which revealed that the NSA created a database containing "the phone call records of tens of millions of Americans" obtained from AT&T, Verizon and BellSouth (the latter merged with AT&T as of 2007).

As such, MAINWAY was seen as the equivalent of MARINA, which is NSA's storage for internet metadata. But meanwhile, various documents from the Snowden revelations have made clear that the actual repositories for telephone metadata are ASSOCIATION (for metadata from mobile calls) and BANYAN (for metadata from landline calls).

MAINWAY itself isn't just a database that stores raw metadata, but a system that also "performs data quality, preparation and sorting functions, and then summarizes contacts represented in the processed data". Afterwards, MAINWAY stores the "resulting contact chains and provides analysts with access to these contact chains".

New documents have also shown that MAINWAY contains metadata from internet communications too. For example, in the following diagram about the FAIRVIEW collection program, we see that internet metadata from the Upstream collection first flow into MAINWAY before ending up in MARINA:


Dataflow for internet metadata collected under the
FAIRVIEW program under Transit Authority
(Click to enlarge)



It seems likely that in MAINWAY, metadata are stored more or less temporarily for the purpose of contact chaining and analysing them. Metadata that NSA wants to keep for a longer period of time, or even indefinitely are then stored in repositories like MARINA, ASSOCIATION and BANYAN.
(However, a report by The Guardian from September 30, 2013 says that MARINA "has the ability to look back on the last 365 days' worth of DNI metadata seen by the Sigint collection system")

While the domestic metadata collected in bulk have to be destroyed after 5 years, the calling records that are the result of a query can be stored by the analyst. According to the PCLOB-report (.pdf), they may then be "subjected to other analytic methods or techniques besides querying, or integrated with records obtained by the NSA under other authorities", as well as shared with others inside and outside NSA.



MAINWAY, SIGINT Navigator (SIGNAV), ASSOCIATION and BANYAN
mentioned in a presentation about DEMONSPIT, under which call
records were obtained from major Pakistan telecom providers(!)
(Click to enlarge)



MAINWAY receiving domestic phone records

Based upon Snowden documents, The New York Times reported on September 28, 2013, that MAINWAY is used for chaining both phone numbers and e-mail addresses and that it is fed with data from tapping "fiber-optic cables, corporate partners and foreign computer networks that have been hacked".

The report also says that as of August 2011, MAINWAY was fed with "1.1 billion cellular records a day in addition to the 700M records delivered currently". However, The New York Times erroneously attributed these numbers to collection under authority of section 702 FAA and was therefore not able to identify that MAINWAY was also fed with the bulk phone records of Americans (which happens under section 215 Patriot Act).

The latter only became clear after The New York Times and ProPublica published some NSA documents about the FAIRVIEW program on August 15, 2015. One of these documents confirms that it was AT&T that provided the aforementioned number of records, and also that this happened under BR FISA (= Section 215) authority.
(A report by the Washington Post from June 15, 2013 also identified MAINWAY as the database in which the phone records from the Section 215 program were stored)
So as of 2011, at least 1,8 billion domestic phone records a day were coming in, which makes 54 billion a month and about 650 billion a year. Before they were handed over to NSA, AT&T stripped off the location data in order to comply with the FISA Court orders, that don't allow those data to be collected.

Apparently Verizon Wireless and T-Mobile US saw no obligation to remove these location data, so their cell phone records couldn't be collected by NSA, which therefore only got less than 30% of the domestic telephone metadata.

According to NSA, one of the advantages of putting phone records from multiple American telecommunication companies in one big repository, was that this allowed analysts "to identify chains of communications that cross different telecommunications networks".




Under the President's Surveillance Program (2001 - 2004/2006)

NSA started collecting telephone and internet metadata from US telecommunication providers shortly after the attacks of September 11, 2001. This was part of the President's Surveillance Program (PSP, protected under the STELLARWIND classification compartment), which was based upon what in the end would be 43 subsequent secret authorizations by president George W. Bush.

The goals of collecting these metadata were identifying unknown terrorist operatives through their contacts with known suspects, discover links between known suspects, and monitor the pattern of communications among suspects.

At first, only metadata were collected from communications in which at least one party was outside the US. AT&T (identified as Company A or FAIRVIEW) started to provide both phone and internet metadata from international channels as early as November 2001, and for Verizon (Company B or STORMBREW) the automated transfer of such data started in February 2002. Qwest refused to hand over its records because the government couldn't present a warrant.

Allegedly, raw metadata were transferred in real-time through a high speed data link from the main computer centers of the telecoms to a government facility in Quantico, Virginia. Although Quantico is an FBI compund, the BR FISA review says that it was an NSA mission element, the name of which was redacted, that obtained the records from the providers.

Then, parsers were used to filter the metadata of unwanted information (like credit card numbers), and the records were put in a standard format compatible with NSA databases.

For example, in September 2003, AT&T "captured" several trillion internet metadata, of which some 400 billion records (apparently those with a high probability of containing terrorist communications) were selected for processing. These were flowing into the MAINWAY contact chaining database, which also contains metadata from collection abroad. The 2009 report about the STELLARWIND program says:
"NSA's primary tool for conducting metadata analysis, for PSP and traditional SIGINT collection, was MAINWAY. MAINWAY was used for storage, contact chaining, and for analyzing large volumes of global communications metadata."

(interestingly, in some documents MAIN WAY seems to be written as two separate words, which make it resemble MAIN CORE, which is a central database containing essential intelligence information on Americans produced by the FBI and other US intelligence agencies)



Under FISA Court orders (2004/2006 - 2011/2015)

In July 2004, the collection of domestic internet metadata was moved from the President’s Surveillance Program to the FISA Court, which authorized this effort based upon section 402 FISA, or as it is called by NSA: PR/TT (short for Pen Register/Trap and Trace).

In May 2006, the same happened with the bulk telephone records, for which the FISA Court allowed continuation under authority of section 215 USA PATRIOT Act, or as NSA calls it: BR FISA (short for Business Records FISA).

Under the FISA Court orders, bulk telephone collection eventually became to include "all call detail records or 'telephony metadata' created [...] for communications between the United States and abroad" or "wholly within the United States, including local telephone calls". Only metadata of fully foreign communications were excluded, as was the case for most mobile phone calls, due to technical reasons.

Because right from the beginning, NSA stored these domestic phone and internet metadata in the same database (MAINWAY) that contains metadata from traditional collection efforts abroad, queries could result in contacts chains made up of identifiers from both foreign and domestic sources. The query tool simply didn't identify the difference.

Also it was possible for analysts to start a query with selectors that were not BR FISA-approved, and in some cases this also provided results from both the foreign and the domestic collection. This was not according to the FISA Court orders, and after NSA informed the court about this, they had to stop accessing the telephone metadata in 2009, until these issues had been solved.*

An internal NSA training module from 2011 shows that at least by then, NSA had tagged the metadata records with XML tags to identify not only what legal authority the metadata were collected under, but also the SIGAD of the intercept facility where that had happened.



A rare diagram about the BR FISA metadata collection:
the decision process as it was from 2006 - 2009
(Source - Click to enlarge)



Other databases for domestic call records

The domestic call records were not only stored in MAINWAY, but also in another database, one that was apparently dedicated for US phone metadata. An NSA training presentation (.pdf) from 2007 confirms that BR FISA data were stored in two NSA repositories, although both names had been redacted.

An NSA review from June 2009 describes this second database as a "repository for individual BR FISA metadata call records for access by authorized Homeland Security Analysis Center (HSAC) and data integrity analysts to view detailed information about specific telephony calling events".

This seems to refer to the complete calling records, and also the PCLOB-report (.pdf) about the BR FISA program says there's analysis software that "provides the associated information about the telephone calls involved, such as their date, time of day, and duration".

So probably the second database gave access to these additional details, whereas MAINWAY only contains or provides "summaries of one-hop chains", i.e. selector #1 was in contact with selector #2 and the number of times this happened within a specific timeframe.

In the glossary of the 2009 NSA Review, the second repository is listed with a remarkably long name, which, according to its position, has to start with and M, N or O:



This exceptionally long name of the second database could indicate that it was some kind of provisional repository, because on page 23 of the 2009 BR FISA review it is said:
"NSA is preparing to incorporate the [second database] into the NSA corporate architecture. This transition to the corporate engineering framework will maximize use of the latest technologies and proven configuration management to minimize any security and compliance risks"

And indeed, in appendix B of a report (.pdf) by the NSA's Inspector General from August 1, 2012, we see that the second database now has a shorter name, and that it had replaced a "Transaction Database" with a much longer name in January 2011:



Transaction is another term that NSA uses for metadata, so "transaction database" probably just means that it contains the (full) metadata records. This 2012 Inspector General report lists three additional storage systems for BR FISA data, making a total of five being involved here:
1. Contact chaining database that accepts metadata from multiple sources (= MAINWAY)
2. Database repository that stores detailed metadata information, which supports the contact chaining summaries in [MAINWAY]. Replaced an earlier database in January 2011.
3. Contingency database for the time the aforementioned database was being rebuild
4. System backup that stores an exact copy of the raw metadata from the providers
5. Backup tapes on which periodically the raw metadata were saved off-line

So when NSA needs large data centers, that's also because the same sets of data are stored multiple times. Besides backups, there are often separate databases dedicated to a specific purpose or analysis method.


Bulk internet metadata (PR/TT)

As mentioned before, MAINWAY was not only fed with telephone metadata, but also with metadata from domestic internet communications. These metadata include the "to", "from", and "cc" lines of an e-mail, as well as the e-mail’s time and date. Its seems that for contact chaining, no metadata from other kinds of internet communications, like messengers, were used.

On August 11, 2014, an internal NSA Review (.pdf) about this PR/TT program was declassified, which shows similar storage systems as for the phone records: full copies of the internet metadata were also stored in the MAINWAY contact chaining database, as well as in a dedicated second repository:


The PR/TT bulk internet metadata program was shut down in December 2011 for "operational and resource reasons" and all data were deleted. Based upon declassified NSA reports, The New York Times reported on November 19, 2015, that this "internet dragnet" was ended because, among other reasons, similar results could be achieved under other authorities:
- Section 702 FAA, which allows access to internet communications between foreigners and Americans from the "PRISM-providers" and "Upstream collection".

- The SPCMA regulation, which allows using US person identifiers for querying metadata that have been collected abroad.

With collection of internet metadata both overseas (under EO 12333 authority) as well as at the physical and virtual borders of the US (under 702 FAA), NSA probably didn't need the purely domestic ones anymore, to still capture those that are of interest.

Also, querying the metadata collected overseas appeared more attractive, because abroad, NSA is allowed to collect much more types of metadata, than inside the US, where collection was heavily restricted by the FISA Court.

In a declaration for the FISA Court from February 13, 2009, then NSA director Alexander explained that multi-tiered chaining of phone calls is more efficient and useful, "because unlike e-mail, which involves the heavy use of spam, a telephonic device does not lend itself to simultaneous contact with large numbers of individuals".


Replacement?

According to the secret Budget Request to Congress for 2013, NSA wanted to create (or maybe expand MAINWAY into) a metadata repository capable of taking in 20 billion metadata records a day and make these available to analysts within 60 minutes.

But after Snowden disclosed the Verizon bulk phone records order in June 2013, the American public became aware of the actual scope of this program and it became the most controversial part of NSA's activities.

In January 2014, the Privacy and Civil Liberties Oversight Board (PCLOB) judged that Section 215 collection was actually of "minimal value in safeguarding the nation from terrorism" and that there was "no instance in which the program directly contributed to the discovery of a previously unknown terrorist plot or the disruption of a terrorist attack".

According to PCLOB, the bulk phone records did provide some value "by offering additional leads regarding the contacts of terrorism suspects already known to investigators, and by demonstrating that foreign terrorist plots do not have a U.S. nexus". This however, was not seen as a sufficient justification for the large-scale collection of domestic phone records.

In the course of 2015, US Congress eventually enacted the USA FREEDOM Act, which prohibits NSA to collect and store domestic call records in bulk as of November 29, 2015. Instead, the agency now has to apply for a warrant from the FISA Court approving specific selectors, which are then provided to telecommunication providers, who use them for querying their own databases and only the results are handed over to NSA.

How this new regime will work out, is explained in the USA FREEDOM Act Business records Fisa Implementation Transparancy Report (.pdf), which was published just a few days ago.


> Next: How NSA contact chaining combines domestic and foreign phone records



Links and Sources
- EmptyWheel.net: What We Know about the Section 215 Phone Dragnet and Location Data (2016)
- PCLOB: Report on the Telephone Records Program Conducted under Section 215 of the USA PATRIOT Act (pdf) (2014)
- Cryptome.org: NSA FISA Business Records Offer a Lot to Learn (2013)
- US Administration White Paper: Bulk Collection of Telephony Metadata under Section 215 of the USA PATRIOT Ac(pdf) (2013)
- NSA: Business Records FISA NSA Review (.pdf) (2009)
- NSA: Pen Register/Trap and Trace FISA NSA Review (.pdf) (2009)
- Andrew P. MacArthur: The NSA Phone Call Database: The Problematic Acquisition and Mining of Call Records in the United States, Canada, the United Kingdom, and Australia (2007)

December 23, 2015

Leaked documents that were not attributed to Snowden

(Last edited: December 30, 2015)

Since June 2013, numerous top secret documents from the American signals intelligence agency NSA and its British counterpart GCHQ have been disclosed. The overwhelming majority of them came from the former NSA contractor Edward Snowden.

But what many people probably didn't notice, is that some of these documents were not provided by Snowden, but by other leakers. Often, the press reports didn't mention that very clear, and it was only by not attributing such documents to Snowden, that it became clear they came from someone else.

So far, the following secret and top secret documents have been disclosed without having been attributed to Snowden:

- Chancellor Merkel tasking record
- TAO product catalog
- XKEYSCORE rules: TOR and TAILS
- NCTC watchlisting guidance
- NCTC terrorist watchlist report
- XKEYSCORE rules: New Zealand
- Ramstein AFB supporting drone operations
- NSA tasking & reporting: France
- NSA tasking & reporting: Germany
- NSA tasking & reporting: Brazil
- NSA tasking & reporting: Japan
- Chinese cyber espionage against the US
- XKEYSCORE agreement between NSA, BND and BfV
- The Drone Papers
- Cellphone surveillance catalogue

- Some thoughts on the form of the documents
- Some thoughts on the motives behind the leaks
- Conclusion


Document collections

The most user-friendly collection of all the leaked documents can be found on the website IC Off The Record (which started as a parody on IC On The Record, the official US government website on which declassified documents are published).

Other websites that collect leaked documents related to the Five Eyes agencies, so from Snowden as well as from other sources, are FVEY Docs and Cryptome. The Snowden-documents are also available and searchable through the Snowden Surveillance Archive.


Domestic US leaks

Here, only leaks related to foreign signals intelligence and related military topics will be listed. Not included are therefore documents about American domestic operations, like for example several revelations about the DEA.

(Also not included are stories based upon leaks without original documents being published, like for example about NSA's interception efforts against Israel)



          - Documents not attributed to Snowden -         


Chancellor Merkel tasking record

On October 23, 2013, the German magazine Der Spiegel revealed that the NSA may have eavesdropped on the cell phone of chancellor Merkel. This was based upon "the excerpt from an NSA database about Merkel's cell phone", which the magazine received.* A journalist from Der Spiegel made a transcription of the database record, and later on, a copy of this transcription was printed in some German newspapers.
Glenn Greenwald confirmed that this information didn't came from the Snowden archive, and also Bruce Schneier was convinced that this came from a second source.

Articles:
- Kanzler-Handy im US-Visier? Merkel beschwert sich bei Obama
- NSA-Überwachung: Merkels Handy steht seit 2002 auf US-Abhörliste

Document:
- Transcript of an NSA database record






TAO product catalog

On December 29, 2013, the German magazine Der Spiegel published a 50-page catalog from the ANT-unit of NSA's hacking division TAO. It contains a wide range of sophisticated hacking and eavesdropping techniques. The next day, Jacob Appelbaum discussed them during his presentation at the CCC in Berlin.
According to Bruce Schneier this catalog came from the second source, who also leaked the Merkel tasking record and the XKEYSCORE rules.

Article:
- Shopping for Spy Gear: Catalog Advertises NSA Toolbox

Document:
- ANT Product Catalog (SECRET/COMINT)




XKEYSCORE rules: TOR and TAILS

On July 3, 2014, the German regional television magazine Reporter disclosed the transcripts of a set of rules used by the NSA's XKEYSCORE system to automatically execute frequently used search terms, including correlating different identities of a certain target.
According to Bruce Schneier, these rules could be leaked by the second source, which also provided the Merkel tasking record and the TAO catalog.

Article:
- NSA targets the privacy-conscious

Document:
- Transcript of XKeyscore Rules (classification not included)




NCTC watchlisting guidance

On July 23, 2014, the website The Intercept published a manual from the US National CounterTerrorism Center (NCTC) with rules and indications used for putting people in terrorist databases and no-fly lists.
The Intercept says this document was provided by a "source within the intelligence community".

Article:
- The Secret Government Rulebook for Labeling You as a Terrorist

Document:
- March 2013 Watchlisting Guidance (UNCLASSIFIED/FOUO)




NCTC terrorist watchlist report

On August 5, 2014, The Intercept published a report from the US National CounterTerrorism Center (NCTC) about terrorist watchlists and databases.
Just like the previous document, this was also obtained from a "source within the intelligence community". Bruce Schneier says this report is from August 2013, which is well after Snowden had fled the US, and therefore he assumes it was leaked by a third source.

Article:
- Watch Commander - Barack Obama’s Secret Terrorist-Tracking System, by the Numbers

Document:
- Directorate of Terrorist Identities (DTI) Strategic Accomplishments 2013 (SECRET/NOFORN)




XKEYSCORE rules: New Zealand

On March 14 and March 22, 2015, The New Zealand Herald published transcripts of two sets of XKEYSCORE fingerprints that define targets of the New Zealand signals intelligence agency GCSB. They were not attributed to Snowden, although in the weeks before, New Zealand media published several other documents that did come from the Snowden cache.

Articles:
- Revealed: The names NZ targeted using NSA's XKeyscore system
- How spy agency homed in on Groser's rivals

Documents:
- Fingerprint about the WTO (TOP SECRET/COMINT)
- Fingerprint about the Solomon Islands (TOP SECRET/COMINT)






Ramstein AFB supporting drone operations

On April 17, 2015, The Intercept and Der Spiegel published a series of slides showing the infrastructure which is used for operating drones, for which the US base in Ramstein, Germany, acts as a relay station.
In the Citizen Four we see Glenn Greenwald visiting Snowden in Moscow, telling him there's a new source which revealed the role of Ramstein AFB in the drone program.

Articles:
- Germany is the Tell-Tale Heart of America's Drone War
- Bündnisse: Der Krieg via Ramstein

Document:
- Architecture of U.S. Drone Operations (TOP SECRET/REL)




NSA tasking & reporting: France

On June 23, 2015, Wikileaks, in collaboration with the French paper Libération, the German newspaper Süddeutsche Zeitung and the Italian paper l'Espresso, published the transcript of entries from an NSA tasking database, as well as intelligence reports about high-level French targets.

Articles:
- Espionnage Élysée
- Nsa, intercettati i presidenti francesi Francois Hollande e Nicolas Sarkozy

Documents:
- Top French NSA Targets (no classification available)
- Top French NSA Intercepts (up to TOP SECRET/COMINT-GAMMA)
- Economic Spy Order (SECRET/REL)






NSA tasking & reporting: Germany

On July 1, 2015, Wikileaks, in collaboration with Libération and Mediapart, Süddeutsche Zeitung and l'Espresso, published the transcript of entries from an NSA tasking database, as well as intelligence reports about high-level German targets.

Articles:
- NSA Helped CIA Outmanoeuvre Europe on Torture
- I dubbi di Angela Merkel sulla Grecia spiati dalla Nsa americana

Documents:
- Top German NSA Targets (no classification available)
- Top German NSA Intercepts (up to TOP SECRET/COMINT-GAMMA)




NSA tasking & reporting: Brazil

On July 4, 2015, Wikileaks published the transcript of entries from an NSA tasking database about high-level Brazilian targets. Unlike similar disclosures about France, Germany and Japan, no intelligence reports about Brazil were disclosed.

Article:
- Bugging Brazil

Document:
- Top Brazilian NSA Targets (no classification available)




NSA tasking & reporting: Japan

On July 31, 2015, Wikileaks, in collaboration with Süddeutsche Zeitung, l'Espresso, The Saturday Paper from Australia and the Japanese newspaper Asahi Shimbun, published the transcript of entries from an NSA tasking database, as well as intelligence reports about high-level Japanese targets.

Articles:
- Target Tokyo
- Wikileaks: 'Nsa spiava il governo giapponese. Sotto controllo anche Mitsubishi'

Documents:
- Top Japanese NSA Targets (no classification available)
- Top Japanese NSA Intercepts (TOP SECRET/COMINT)




Chinese cyber espionage against the US

On July 30 and August 10, 2015, NBC News published two slides about Chinese cyber espionage against over 600 US companies and government agencies, including access to the e-mail of top government officials since at least 2010.
This leak stands out because the slides are in digital form, and they support a story that shows the neccessity of NSA - which seems to point to an authorized leak.

Articles:
- Exclusive: Secret NSA Map Shows China Cyber Attacks on U.S. Targets
- China Read Emails of Top U.S. Officials

Documents:
- China: Cyber Exploitation and Attack Units (SECRET)
- U.S. Victims of Chinese Cyber Espionage (SECRET)




XKEYSCORE agreement between NSA, BND and BfV

On August 26, 2013, the German newspaper Die Zeit published the transcript of the Terms of Reference (ToR) about the use of NSA's XKEYSCORE system by the German security service BfV.
Being a transcript and being about XKEYSCORE, this could be from the same source as the XKEYSCORE rules, but it's also possible it came from a source within a German government agency.

Article:
- A Dubious Deal with the NSA

Document:
- XKeyscore - the document (SECRET/COMINT)




The Drone Papers

On October 15, 2015, The Intercept published a series of documents with details about drone operations by the US military between 2011 and 2013.
In the Citizen Four we see Glenn Greenwald visiting Snowden in Moscow, telling him there's a new source which revealed the role of Ramstein AFB in the drone program, including the chain of command diagram which is part of this batch of documents.

Articles:
- The Assassination Complex
- The Kill Chain

Documents:
- Small Footprint Operations 2/13 (SECRET/NOFORN)
- Small Footprint Operations 5/13 (SECRET/NOFORN)
- Operation Haymaker (SECRET/NOFORN)
- Geolocation Watchlist (TOP SECRET/COMINT)






Cellphone surveillance catalogue

On December 17, 2015, The Intercept published a range of pages from a classified catalogue containing cellphone surveillance equipment, including IMSI-catchers like Stingrays and DRT boxes.
Just like the NCTC reports, The Intercept obtained this document from a "source within the intelligence community".

Article:
- Stingrays - A Secret Catalogue of Government Gear for Spying on Your Cellphone

Document:
- Government Cellphone Surveillance Catalogue (SECRET/NOFORN)







It is difficult to tell exactly from how many different leakers these documents come. The journalists involved will of course do everything to hide their source's identity, including creating distraction and confusion, but also creating the impression that many other leakers followed the example of Edward Snowden.



Some thoughts on the form of the documents

Content-wise the documents from the alleged other sources are not very different from the ones from Snowden. But what seems to distinguish them most, is their form, which is either digital, a transcript or scanned from paper.


Digital

Almost all documents that were attributed to Snowden came in their original digital form (with some very few exceptions that were scanned from paper). This makes it remarkable that only two documents from the other sources are in a similar digital form.

The first one is the famous TAO Product Catalog with hacking and eavesdropping techniques, which also given its content comes closest to the Snowden documents. Despite that, this catalog was never attributed to him.

The other leak in digital form are the two slides about Chinese cyber espionage, but these probably come from a source in support of the US government.


Transcripts

A number of other leaks didn't provide documents in their original form, but only transcripts thereof. This is the case for the following revelations:
- Chancellor Merkel tasking record
- XKEYSCORE rules: TOR and TAILS
- XKEYSCORE rules: New Zealand
- XKEYSCORE agreement between NSA, BND and BfV
The lists from an NSA tasking database with targets for France, Germany, Brazil and Japan are also transcripts, but for the intelligence reports, which Wikileaks published simultaneously, we have at least one example that is in its original format. All other ones came as transcripts.


Scanned from paper

All other documents that didn't came from Snowden look like they were printed out (some were even recognized as being double-sided) and scanned again. This is the case for:
- NCTC watchlisting guidance
- NCTC terrorist watchlist report
- Ramstein AFB supporting drone operations
- The Drone Papers
- Cellphone surveillance catalogue
This doesn't automatically mean they are all from the same source, as two of them are from the civilian NCTC and the other three are clearly from a military context.

We don't know when or where these documents were printed out: maybe it was done by the leaker, for whom it could have been easier to exfiltrate them as hard copy, than on a detectable thumb drive.

It's also possible that they were printed out by the press contact in order to make them look different from the Snowden documents. But on the other hand, publishing them in digital form would have made it more difficult to prove they were not from the Snowden cache.



Some thoughts on the motives behind the leaks

We can also take a look at the motives that could have been behind these leaks. Interestingly, these seem to correspond quite well with the different forms the documents have.


A second source

The disclosures of the transcriptions of the XKEYSCORE rules and the tasking database lists are quite far from being in the public interest. They are about legitimate targets of foreign intelligence and publishing them seems solely meant to discredit the NSA and/or damage US foreign relationships.

The same applies to the TAO Product Catalog, which contains devices and methods that are only used against "hard targets" that cannot be reached by other means, so this is not about spying on ordinary citizens, but does compromise valid US intelligence operations.

At first sight, one would assume that these documents were from the Snowden cache, but published by people like Appelbaum and an organization like Wikileaks, who have a more radical approach than Snowden himself, and maybe therefore could have pretended they came from another source.

However, both Greenwald and security expert Bruce Schneier said these documents were really provided by another leaker. Because a number of them were published by German media, Schneier guesses it might be "either an NSA employee or contractor working in Germany, or someone from German intelligence who has access to NSA documents".

If that's the case, then it's not only remarkable that there's a second source from within or close to NSA, but also that this source is apparently fine with leaking documents that show no abuses, but only seriously harm US interests - which is either treason, or the work of a hostile intelligence agency. Snowden at least acted from his concern about increasing mass surveillance on innocent civilians.


A third source

The documents that are scanned from paper are a somewhat different story. These are about issues that concern a wider range of people. For some of them, The Intercept even gives the reason why the source leaked them: for the cellphone surveillance catalogue it was because of a concern about militarization of domestic law enforcement.

For the drone papers, the source is cited saying: "This outrageous explosion of watchlisting [...] assigning them death sentences without notice, on a worldwide battlefield". Given that he mentions watchlists, it seems very well possible that this source actually also leaked the two NCTC reports about terrorist databases and watchlists.

Combining this with the fact that both the NCTC reports and the cellphone surveillance catalog were from a source "within the intelligence community" seems to confirm that all the documents that came as scanned from paper are from the same leaker - maybe someone from a military intelligence agency like the DIA.



Conclusion

Given these thoughts on the form of the leaked documents and the possible motives behind these leaks, it seems that they can be attributed to at least three other sources, beside Snowden:

Source nr. 1 (Edward Snowden)

Source nr. 2 (German NSA employee or hostile intelligence?)
- Chancellor Merkel tasking record
- TAO product catalog
- XKEYSCORE rules: TOR and TAILS
- XKEYSCORE rules: New Zealand
- NSA tasking & reporting France/Germany/Brazil/Japan
- XKEYSCORE agreement between NSA, BND and BfV
Source nr. 3 (someone from US military intelligence?)
- NCTC watchlisting guidance
- NCTC terrorist watchlist report
- Ramstein AFB supporting drone operations
- The Drone Papers
- Cellphone surveillance catalogue
Source nr. 4 (someone from the US government?)
- Chinese cyber espionage



Links and Sources
- Schneier.com: The US Intelligence Community has a Third Leaker

More comments on Hacker News