March 15, 2016

Something about the use of selectors: correlations and equations

(Updated: August 24, 2016)

The Snowden revelations made people familiar with what NSA calls "selectors": phone numbers, e-mail addresses and a whole range of similar groups of characters that can be used to identify a particular target.

However, very little was revealed about how exactly these selectors are used in order to pick out communications of interest. But meanwhile, declassified documents about NSA, German parliamentary commission hearings and an intelligence oversight report from The Netherlands give some details about that.

It came out that the signals intelligence agencies of these three countries (and likely many other countries too) group all selectors that belong to a certain target into sets called correlations or equations.

Wrapping individual selectors into equations makes sense, as one of the most important requirements for signals intelligence is of course knowing which phone numbers, e-mail addresses etc. a particular target uses, as often they will use many of them and change them regularly.

United States

In two recent postings on this weblog, the NSA's storage and analysis of domestic phone records under the Section 215 (or BR FISA) program was analysed. Information about this program comes almost solely from a large number of documents that have been declassified by the US government.

Among those documents is a BR FISA Review (.pdf) from 2009, in which, probably for the first time, we find the term "correlation". The report says that NSA uses correlated selectors to query the BR FISA metadata. The function of such a set of selectors is described as follows:
"If there was a successful RAS determination made on any one of the selectors in the correlation, all were considered RAS-approved for purpose of the query because they were all associated with the same [target redacted]"

RAS stands for Reasonable Articulable Suspicion, which must be determined for a certain selector, before it can be used to query the domestic telephone metadata. So, when one selector was RAS-approved, the analyst was allowed to also use all other selectors that were correlated to the same target.

This practice of what can be described as "one approved selector approves the whole correlation set" was ended when on February 20, 2009, the Emphatic Access Restriction (EAR) tool was implemented. Since then, each selector has to be individually RAS-approved before it can be used to query the metadata database.

Note that this only applied to selectors used for querying domestic phone records. As we learned from the German situation described below, NSA continued to use correlations for its foreign collection efforts overseas.

Correlation database

According to the BR FISA Review, NSA has a database that holds correlations between selectors of interest and which provides automated correlation results to analysts. So when an analyst wants to know which (other) identifiers a certain target uses to communicate, he can look that up in this database.

The name of this database was redacted, but according to its position in the review's glossary, it starts with A. The correlation database is therefore different from the OCTAVE tasking tool, which is used to activate telephony selectors on the various collection systems. Analysts can therefore decide by themselves which of the correlated selectors they actually want to task.

It's not clear though whether these correlations include both phone and internet selectors, but obviously it's useful to collect and group all kinds of identifiers used by a particular target.

Glossary of the 2009 BR FISA Review report, with
in the 4th position the correlation database


The way NSA uses correlations immediatly reminds of a practice that was revealed during hearings of the German parliamentary commission that investigates NSA spying practices. On May 20, 2015, BND employee W.O. explained that until 2012, the NSA sent its selectors to BND in the form of a so-called "equation".

According to the witness, an equation was a record that could contain up to one hundred selectors used by or related to a particular target. This large number of selectors is because the equation contains all different ways of spelling and technical encoding permutations of a selector. For one e-mail address this could for example be:
mustermann%40internet%2Eorg (HTML-Hex)
mustermann\&\#37; (multiple encodings)
mustermann\\ (UTF-16)

The explanation given by witness W.O. of how BND managed these NSA equations was rather confusing, but an important element seemed to be that such a whole set of selectors could be prevented from being activated, when BND rejected just one selector when using it would violate German law or German interests.

Especially for internet identifiers (like chat handles or nicknames) it can be very difficult if not impossible to attribute them to a particular country. But when an equation contains just one identifier that is easier to attribute (like an e-mail address), the whole set of selectors can be either approved or disapproved based upon the identifyable selector.

Witness W.O. contradicted himself on whether an equation contains only internet selectors, or also telephone numbers (with wildcards and blanks), but on September 24, 2015, witness D.B. said that equations were only used NSA internet selectors.

Splitting up

W.O. also explained that until 2012, the NSA sent its selectors in the form of equations. When BND rejected one selector from such an equation set, BND employees in Bad Aibling had to ask NSA to remove that number from their equation, or else the other selectors in that equation were rejected too.

Since 2011, these equations were split up and phone and internet selectors were each put in separate databases, which apparently made it possible to reject individual selectors. Afterwards, the computer system reassembles the selectors into their proper equations again, which can now have for example a rejected phone number alongside an approved e-mail address. But if one of them is disapproved, the whole equation will not be forwarded to the collection system.

This explanation by witness W.O. is rather puzzling because the situation before and after 2011/2012, and before and after splitting up the equations seems to be the same: in both cases all selectors from an equation are rejected when just one of them was disapproved.

It seems therefore that splitting up the equations had another purpose, but that didn't become clear from the commission hearings. The commission members often had difficulties in understanding these technical issues and were then hardly able to ask the witnesses the questions that could bring clarity.

Maybe the splitting up only meant separating telephone and internet selectors, as from the report of a special independent government investigator it did became clear that NSA provided a description or a justification for every single telephone selector, but that justifications for internet selectors weren't available for BND personnel.


There's similar confusion about the internal BND investigation into the selectors provided by the NSA. Witness D.B. explained that when in August 2015, Dr. T. investigated suspicious NSA internet selectors, he was not given them in the form of equations, but as separate, individual ones.

Apparently D.B. suggested that this was the reason that Dr. T. found so many selectors that could not be identified: they were separated from correlated ones that could have made them easier to identify. But why separate these selectors when that rips them from elements that attributes them to a certain target and/or a particular country?

BND selectors

What is said before is only about the selectors that were provided by NSA, in order to be tasked on the satellite collection system operated by BND in Bad Aibling. Besides these, BND of course also has its own selectors.

During the hearing from January 28, 2016, witness D.B. was asked whether BND's own selectors were also grouped into equations. D.B. explained that BND doesn't use the term equation, but that in its central tasking database system PBDB, there are multiple selectors for a certain target (with for each selector (German: Telekommunikationsmerkmal or TKM) multiple permutations).

In the German magazine Der Spiegel from April 2, 2016, it was explained on page 33 that selectors used by BND have the following format: they start with an e-mail address, a phone number or a similar designator, followed by the intelligence topic, with WPR for Waffenproduktion, LAP for Landwirtschaftspolitik, TEF for Terrorfinanzierung and ISG for Islamistische Gefährder, then the country which is spied upon, designated by 3 letters, and finally a Sperrvermerk for those foreign intelligence agencies that should not see the results for this selector. They are designated with a 4-letter abbreviation of their codename, like HORT for HORTENSIE (United States) or BEGO for BEGONIE (Denmark).

The BND satellite intercept station at Bad Aibling, Germany
(Photo: AFP/Getty Images)

The Netherlands

In the Netherlands, a report (.pdf) from last February by the intelligence oversight commission CTIVD advised the the General Intelligence and Security Service AIVD to consider using some kind of correlations or equations for its bulk collection efforts too.

The report reveals that currently, the AIVD uses a list (Dutch: kenmerkenlijst) containing all selectors, like phone numbers, e-mail addresses and keywords, used for specific operations. For most of these selectors, the list contains a short justification for why it was put on this list, with a reference to an underlying document. Earlier, the commission found that too often, these justifications were too short, not related enough to the target, or even absent.

According to the commission, it would be better when the AIVD would provide a justification for each targeted person or organisation, instead of for every single selector. Often, one target will use multiple phone numbers and e-mail addresses. Grouping them by target and providing a justification for that target would therefore also reduce the length of the list.

This approach is already used by AIVD when it comes to targeted interception.

1 comment:

Anonymous said...