(Updated: September 15, 2014)
This time we present an article written in cooperation with the French weblog about intelligence and defence Zone d'Intérêt in which we compare the data collection of Google to intelligence agencies like NSA:
Since 1998, Google has grown to become an essential part of the web infrastructure and took an important place in the daily lives of millions. Google offers great products, from search engine to video hosting, blogs and productivity services. Each day, users provide Google, willingly and candidly, with many different kind of personal information, exclusive data and files. Google justifies this data collection for commercial purposes, the selling of targeted ads and the enhancement of its mostly free services.
These terabytes of user data and user generated content would be of tremendous value to any intelligence service. As former director of CIA and NSA Michael Hayden half-jokingly stated at Munk debates: "It covers your text messages, your web history, your searches, every search you’ve ever made! Guess what? That’s Google. That’s not NSA."
But really, how would a company like Google compare to an intelligence agency like the NSA? How would it be able to gain access to confidential information and go beyond OSINT (Open Source Intelligence)? Does Google even have the resources, data and technical capabilities to harvest all-sources intelligence like a major intelligence service would?
Google's unofficial motto is "Don't be evil", but what if Google started being evil and used all of its collected information as an intelligence agency would? What if intelligence professionals had access to Google's resources and data ? What would it mean for the users? And can this be prevented somehow? (it’s also rather ironic that many people now see NSA as a big evil organization, but Google collects even more)
This is the worst case scenario we'd like to explore:
What if Google was an intelligence agency?
Communications to intercept, private data to collect
As a major webmail (425 million active Gmail users in 2012 - source: Google I/O 2012) and instant messaging provider with Hangouts, Google has access to the daily communications of millions of individuals, corporations and organizations. This privileged access to telecommunications worldwide gives Google the opportunity to act as a major COMINT agency, not unlike NSA or GCHQ. Storing its users e-mails and broadcasting their instant messages with audio and video, Google is able to obtain a deep-reaching knowledge of their habits, intents and projects, either personal, professional or commercial. Enhanced with behavior analysis and targeted with collection selectors, theses communications, already stored on the company's servers could be used as a very powerful intelligence database.
NSA only stores data that have any foreign intelligence value, other data that might be useful are automatically deleted after 5 years, but how is that with Google? In the European Union, administrative authorities in charge of data protection, assembled in the Article 29 Working Party of the European Commission (or "G29"), have issued multiple warnings and penalties against Google regarding this issue. In January 2014, the french CNIL, an Art. 29 Working Party member, issued a 150 000€ monetary penalty to Google for failing to define retention periods applicable to the data which it processes. Data collected by Google isn't as strictly regulated and controlled as data collected by intelligence agencies, and it can stay on Google's servers until the company decides to delete it, at its own discretion.
And how about the risk if internal policy and privacy violations by Google personnel? Does Google has access control mechanism just as strict and tight as the compartimentalization and ‘need-to-know’ at NSA? They should have, as Google has far more information about ordinary people in its databases, which could be much more tempting to look at for employees than for example all the military and terrorism stuff that NSA collects. But Google also has to protect this information against foreign intelligence agencies.
Google also provides its users with phone services through its Android phone and tablet operating system, with 1 billion users worldwide in 2014 (source: Google I/O 2014). This could be used as an opportunity to monitor the calls - made or received - by its users, collect their metadata and even record their calls for intelligence purposes. This also goes for SMS and MMS send or received by its users, as android users send 20 billion text messages each day (source: Google I/O 2014). NSA’s database for SMS-messages DISHFIRE receives just around 200 million messages a day. Google is expanding the reach of its phone services, as calls to landline and mobile phones can be placed from Hangouts by any user of Gmail, Google+ and Chrome, even without using an Android device. With Fiber, Google is providing ISP services to three cities in the United States, with plans to expand. Google even wants to introduce internet access to remote areas in Africa via solar-powered balloons – which would also make it much easier for NSA, as many of these regions are also terrorist-related conflict zones where there’s often only mobile phone and radio traffic, which is more difficult to intercept than internet traffic, especially when the latter goes through a US company.
The expanding realm of its webmail and cloud services provides Google with a rare trove of otherwise private individual data and even confidential information from governments and companies. With Gmail, Google has access to sensitive information about individuals, such as their names, phone numbers, addresses or even social security numbers which may transit via e-mail. Logins and passwords from web services are often sent by e-mail, and so are activation and authentication codes. Many users want to take advantage of the free services offered by Gmail and automatically forward e-mails from other webmails or their company e-mail address to their Gmail address, creating a POP/SMTP link. Doing so, they increase the amount of e-mails and information accessible to Google. Private information about individuals, from health and financial issues to clues about their emotional state or relationship status can be found in e-mails. Everything from their buying habits, reading habits or subscriptions, to confidential information, can be extracted from e-mails using already available software, and then easily exploited by intelligence professionals.
Contact lists from services like Gmail, Hangouts, Google+ and from operating systems like Android and Chrome OS would be a valuable source for intelligence analysts, as they allow to identify links between individuals and perform social network analysis. Contacts lists were used in many occasions by intelligence agencies leading investigations against terrorist cells or organized crime groups, but can also be used in social engineering schemes or commercial intelligence.
Corporate information is hosted by Google through most of its services, as Gmail is used by many entrepreneurs and employees, whether it is duly authorized by their company or not. Important information can be retrieved in e-mails, such as details of industrial projects, business offers and everyday company communications. Many companies use Gmail attachments to send and receive corporate documents or use Google Drive to store their information. Google Calendar can also provide a great window into the daily activities of a company, as a way to identify links between individuals, be alerted of forthcoming meetings, receive status reports from ongoing projects, or deduce a precise timeline of employees work habits. Recently, Google announced that 58% of Fortune 500 companies have "gone Google" and so did 66% of "50 top Start-Ups" and 72 of the 100 best universities (Source: Google Enterprise).
Given all these data containing often highly sensitive and private information, it is remarkable that people, businesses and organisations are so willing to trust it into the hands of Google. One wonders why some people really don’t like it when government officials could have access to such kind of information, but apparently completely trust the Google personnel. Who guarantees that Google isn’t looking into confidential information of other businesses that can be of interest?
Google Search, the first service provided by Google since 1998, receives about 100 billion searches per month and is a great tool used every day by intelligence professionals. Google search crawlers scan the web for individual URLs, web pages and files, using the Google powerful servers. They are able to record, collect and cache any kind of text content, images, video and audio files, and most document formats such as Word and PDF. Google Search can be used to find unrestricted or insufficiently secured subdomains, files, folders and archives, from websites and networks. Using advanced operators, Google can be used to find misplaced confidential information and other vulnerabilities. If there’s one application that is able to read your deepest thoughts, fears and desires, like Edward Snowden said NSA is capable of, then it is Google Search.
Individuals to identify, targets to monitor
Google Search can also be exploited for advanced statistics, behavior analysis of users, identification of single users, and to locate them. Using cookies and connection data recorded by Google for every search, such as IP address, user agent and search terms, the user can be identified and located to a certain extent. Taking advantage of persistent cookies, IP adresses and forensic techniques, such as discourse analysis or syntax analysis, and sifting through recorded searches, online activity through Google services can then be narrowed down to a single organization, a set of users or even a single user.
Recording precisely the search terms from an identified user, company or organization can help an intelligence professional create new, more efficient selectors for intelligence collection and communication interception, based on the interest of users and unique searches. For example, many companies will use Google to find new business prospects, partners or suppliers. Journalists will do background checks on their sources using Google. Scholars and scientists will do their research using Google search, revealing precise information about what they are looking for and what they are working on.
Similar data is collected on many other websites which are not owned or related to Google, but which make use of Google Analytics, a Google-run service allowing webmasters to collect detailed information about their users, such as their IP addresses (collected by Google but not shown to webmasters), what search terms they used to reach their websites and which pages they browsed. While challenging sanctions from the European Art. 29 Working Party, Google refuted that an IP address constitutes personal data, even when associated with data from cookies, and should not be treated as such regarding privacy issues. Which once again shows the different views on privacy in Europe and the US
But Google has access to much more precise data to identify users and monitor their online activities. Some services, such as Gmail, require users to be registered and to give accurate personal information, such as their real name, their birthdates, their country of residence or another e-mail address they own. Google is also pushing two-factor authentication, requiring that their users disclose an active phone number. While launching its Google+ service, which is now linked to other services such as Gmail and Youtube, Google discouraged the use of pseudonyms and required that all users registered using their real name, or risk account suspension. In October 2012, G29 issued a recommendation to Google that it must inform new users more clearly that they can sign-up to a Google account without providing their real name.
When users use any Google service while logged in, or with Google cookies activated, or even from an IP address which was previously used while logged in, all of their online activity transiting on Google networks can be traced back to them. On many occasions, personal files and documents stored on Google Drive, or images stored on Google+ Images and Picasa could be traced by Google back to the real name of a registered user. E-mails, instant messages, personal documents, videos and pictures, all stored by Google, can be used to create a very complete and precise profile of a single individual. According to numbers published by Google during I/O 2014, Android users send "93 millions selfies" each day.
The Google image search algorithm is able to identify faces and places in pictures. The image search facial recognition feature is only activated to find pictures of celebrities, but Google+ Photos includes an opt-in service called "Find My Face" capable of automatically recognizing and tagging the user's face in photos uploaded by him or by his friends. Google implemented a "Face Unlock" feature in Android, allowing users to unlock their devices using their camera, showing that Google's recognition algorithms are precise enough to identify an individual, even with slight changes due to lighting conditions or face expression. In addition, Google recurring pop-ups incite Android users to activate a function which automatically uploads all new photographs taken with their device to Google+ Photos and Google Drive. EXIF data and geotags from each photo are collected too. As another option, Google image search has a "reverse image search" functionality which allows any user to upload an image from his computer and let Google's pattern recognition algorithm find similar images. In the help section of Google's image search, it is stated that "any images or URLs that you upload will be stored by Google".
Google's photos database would be an extraordinary tool to any intelligence professional trying to find someone, learn about its habits or identify people he is related to. Recently, intelligence agencies such as the American DIA (Defense Intelligence Agency) or the French DGSE have been acquiring commercial software to collect videos and photos posted online for intelligence purposes, which shows the interest of intelligence analysts for user generated content. In 2010, Google invested 100 million dollars in Recorded Future a company specializing in data mining, advanced statistics, internet traffic monitoring and defense intelligence. Recorded Future was also funded by In-Q-Tel, the technology investment firm of the CIA.
Using data collected through Google Voice Search and Google Now, intelligence technicians could be able to build a large phonemes database to enhance word recognition algorithms, but also to implement voice recognition in order to identify single users based on their voice. For advanced target monitoring, the microphone from a computer, tablet or smartphone running Android or Chrome OS could be activated in order to eavesdrop on a target, using OS-level or App-level backdoors. Coupled with voice recognition, these techniques could be used to identify and locate targets.
In such a scenario, OS-level access could be used to implement backdoors for keylogging, password collection, communication intercepts, microphone or camera hijacking, or even GPS silent activation and monitoring. Access to Google's database would make network penetration easier, as Android devices record the WiFi passwords from secured access points they connect to and store them to the cloud.
Map any place, locate anyone
In 2004, Google acquired Keyhole, a company partly funded by the CIA and the NGA, which developed the technology behind Google Earth, a Google product which provides users with maps and commercial satellite imagery from around the world. Other Google mapping initiatives are Google Maps and Street View. Google Earth is used by many intelligence professionals, whether they work for government agencies or for private contractors, and is often listed as a common tool in intelligence sector job descriptions and resumes.
A useful feature of Google Maps and Google Earth is the ability for users to add tags, photos and points of interests (POI) over the maps and imagery provided by Google. This feature results in crow-sourced sets of maps, which are improved by the output of users who have good knowledge of the places they describe, whether they are travelers, dwellers or experts. This ground knowledge is obtained at no cost by Google and can result in very detailed descriptions, even from remote places. Google also benefits from the geotagged photographs from Panoramio, acquired by Google in 2007, and from POIs added by users participating in Google side-projects, such as Niantic Labs' Field Trip and Ingress applications. Google recently acquired the imaging company Skybox, taking advantage of its growing constellation of satellites.
Another way for Google to get intel from the ground and improve its worldwide mapping capabilities is Street View, by which Google collects 360° snapshots along roads and trails. With Street View, Google is able to get detailed and fresh information about buildings, installations and constructions. This collection effort even captures photos from remote places or restricted areas, such as military bases or intelligence facilities (examples: MI5 installation in the United Kingdom, DGSE station in France) Google has recently announced Project Tango, which is aimed at developing new sensors for mobile devices, in order to map their surroundings in 3D, such as the interior of buildings. Access to the photographs and geospatial information collected by Google through Google Maps, Street View, Google Earth and Panoramio, but also from search crawlers and user content uploaded to the cloud, would be of considerable interest to intelligence technicians. For instance, Letitia A. Long, director of the National Geospatial Intelligence Agency (NGA) recently stated that her agency was increasingly taking advantage of data collected through open sources and social networks. In these cases the possibilities of Google’s commercial tools seem to have already outpaced those used by government agencies.
Google is also making considerable effort in precisely locating its users. Users are often prompted to authorize their localization by Google services, from Google Search to Google Maps and Android. To achieve precise location of a user, Google is using all data available, from search queries which mention a place, to IP addresses and connection data, to GPS signal provided by the user's device.* Google also uses a patiently crafted database of Wi-Fi access points, hotspots and cell towers, which contains MAC addresses, BSSIDs and Cell IDs. This data is collected by Google Street View cars, contractors, but also when a user device allows localization privileges to a Google service or application. This worldwide crowd-sourced database is very detailed, precise and regularly updated. This data collection is often running in the background on users' devices and provide Google with the precise location of many of its users.
For intelligence purposes, geolocation data could be used to silently track a target or get information about their routines. Localization data is stored and logged by Google, and can be accessed by registered users in their Location History. Access to such information by intelligence technicians could be used for behavior analysis, remote surveillance, forensics and social network analysis. Combined with Google access to many Wi-Fi passwords, a precise map of MAC addresses worldwide would provide intelligence technicians and operators with an opportunity to conduct network penetration and communication intercepts. All this could be very valuable for agencies like NSA, as some of the Snowden-documents showed that they now have to put much effort in mapping such communication networks “from the outside”.
A proxy in intelligence collection?
Google collects user data for commercial purposes, mainly to sustain its business model based on online targeted ads, which accounted for 96% of Google's revenue in 2011. However, Google is sharing its worthy data with governments and their intelligence services, when complying with court orders or local laws. According to its Transparency Report, in 2013 Google complied to thousands of user data requests from governments of countries such as the United States, India, France, Germany, United Kingdom, Brazil or Italy. Google reports that it provides user data to "law enforcement agencies", but does not state exactly what kind of data is given. As example, Google cites IP addresses and personal information given by the users when they register, but it is not clear whether or not data provided to authorities is restricted to these elements. Given the large amount of data collected and stored by Google on every user, government agencies could receive a very detailed history of a user's communications and online activity, or even a copy of its hosted files.
In recent NSA and FBI intelligence collection programs, user data can be requested under a legal framework, such as FISA requests, which does not authorize Google to inform its users of the request. Moreover, clandestine intelligence efforts gave the NSA access to Google's data, without the need for legal requests.
In most democratic countries, intelligence services aren't allowed to intercept communications from their citizens nor to collect user data without the authorization of a judge or commission. Many intelligence activities are meant to be constrained by the rule of law and monitored by congressional oversight to ensure that individual liberties are respected. However, commercial companies are not subject to the same restrictions and can collect a lot of their users data, as long as they duly inform them.
Such loophole can be purposely exploited by an intelligence agency, taking advantage of the ever-growing database from big companies such as Google, either by legally requesting the information collected from their users or by trying to access it covertly. In such occurrences, Google would act as a proxy in intelligence collection, unwillingly (?) putting its resources at the disposal of intelligence services. Citizens and businesses may not want to share as much private information and contents with an internet services company given the possibility that it may later be accessed by intelligence services, domestic or foreign.
One major argument against the collection of data conducted by NSA (or other intelligence angencies) is that they can be used against the people when government is taken over by evil people. Western governments at least have checks and balances, but Google is just a commercial company, and what would happen when, say, some huge Chinese company would take it over? Then our complete digital lives would be under control of people who care less about individual freedom and privacy. As probably no one (especially the US government) wants that to happen, Google will have to stay an American company one way or another – which makes it even more like a proxy for US intelligence.
In a recent case, Google tipped off the National Center for Missing and Exploited Children after scanning the emails of its users, looking for contents related to child pornography. It seems that Google was not asked by a law enforcement agency to monitor the communications of a single user under investigation, or even to scan emails for suspicious contents. Google acted on its own, scanning emails, maybe on a massive scale, to find suspicious activities. Even though going against child exploitation can be seen as a noble endeavor, it seems that Google may be running its own law enforcement operations, scanning its users' data for what it deems illicit. As Google gives little information about the company's operations, it is hard to know what kind of users' activities could be monitored by Google and proactively reported to authorities or others organizations. It is not clear if this proactive reporting only occurs in the United States, or if it may extend to other, less democratic countries.
From an intelligence standpoint, the sheer amount of data that Google collects about individuals and businesses is unrivaled. A single piece of information recorded by Google about a user could be considered innocuous, but the sum of all collected data which can be narrowed down to an individual or an organization gives an intimate picture of its thoughts, intent and activity.
The way Google systematically tries to gain access to new kind of data about its users, whether it's their e-mails, their work files, their personal pictures, their location, or confirmation of their real identity, is propelled by a commercial strategy and a so-called wish to "change the world", making their users' lives easier. However, this "know-it-all" approach facilitates data mining efforts from intelligence services which pursued programs such as "Total Information Awareness" and are conducting large-scale intercepts.*
Of course, this issue is not confined to Google but affects other companies such as Amazon, Apple or Facebook, as well as many other smaller companies. Still, Google owns a special place in the digital world of user data, as it concentrates a wide range of user information, operates phone and email services, develops operating systems and stores users files in the cloud. Google holds a big responsibility to ensure the security and privacy of its users data worldwide, but its ongoing efforts to do so can hardly be considered sufficient.
Google security practices are generally considered state of the art and the company recently announced support for end-to-end encryption in GMail, but the body of messages will remain unencrypted on Google's servers and accessible to the company's bots. In october 2013, Google became aware of a covert network penetration lead by the NSA, targeting communications links connecting the company's data centers, which were not encrypted.* The exact amount of user data which may have been collected by the NSA during the operation is still unclear.
- As a major stakeholder in the worldwide web, Google has to bring more accountability and transparency about what is shared from its users. The user data that could potentially be provided to law enforcement agencies should be clearly and precisely marked as such. It should become clear to all users that some of their data, whether it's personal information, files, e-mails, messages, metadata from network traffic or phone calls, or even recorded communications may become available to intelligence services.
- Also, Google should clarify if this information can be provided only to the law enforcement agencies of the user's country of residence or also to United States government agencies, as Google is an American company with most of its servers and activities in the US.
- American web companies and cloud operators are facing growing critics about their vulnerability to US intelligence operations. Some in Europe advocates for sovereign "national clouds" restricting data retention and traffic between secured servers and users, forbidding access to the American government. During an hearing before the United States Senate in November 2013, Richard Salgado, Google's director for law enforcement and information security, stated that "in the wake of press reports about the so-called "PRISM" program", he was concerned by the trend of "data localization" that could result in the creation of a "splinternet" and the "effective Balkanization of the Internet". Data localization would also probably cost more to Google, and would place the company under the law of each country where the company processes user data. In many cases Google argued that it was established in the United States and therefore was not subjected to the law of European countries, as all data processing occurs in the USA. However in France, Google was imposed a (small) financial penalty as the administrative authority made clear that the company had to comply with the French Data Protection Act.
- Google cannot condone a systematic breach of confidentiality and privacy of its users. A call to reform US government surveillance laws cannot be considered enough. Google must implement proactive measures, reinforcing its network security, offer end-to-end encryption for all of its services, securely distribute users' files hosting in their countries of residence and better inform its users of privacy risks. These measures could be seen as costly, but are necessary to maintain the trust of Google's user base and main source of revenue.
Google has massive technical capabilities for user data retention, metadata collection, telecommunications monitoring, localization, mapping and imaging, all which could allow it to act as an intelligence agency. The main difference is that Google has a different goal (commercial) than an intelligence agency, but this also makes that Google gathers far more data than an intelligence agency is legally allowed to do.
How long is user data kept on Google's servers? What kind of user data is shared with law enforcement agencies or intelligence services around the world? How does Google prevent its employees to access their users personal data or location? How is the data you gave Google secured against hackers or from intelligence services malicious attacks?
Google don't really say, but you have to take their word for it.
On September 15, 2014, Wikileaks-founder Julian Assange told the Italian newspaper L'Espresso that he now wants to warn against Google: "They believe they are doing good, but they are now aligned with US foreign policy. This means that Google can intervene on behalf of US interests, for example, it can end up compromising the privacy of billions of people, it can use its advertising power for propaganda".