Why Google keeps your data forever, tracks you with ads

Sat, 2010-05-08 08:53 by admin

By Nate Anderson

Read the complete article

Not many companies could get away with defending controversial data retention practices by saying that the data is needed to "learn from good guys, fight off bad guys, [and] invent the future." But that's how Google sees itself and its practices—not surprising from a company that would give itself an unofficial motto like "don't be evil."

I had the chance recently to sit down with two of Google's top privacy people: deputy general counsel Nicole Wong and security/privacy engineer Alma Whitten. While the "good guy/bad guy" and "don't be evil" quotes may seem too cute by half to some, Wong and Whitten made a strong pitch for the truth of both slogans. In their view, Google really is fighting the good fight when it comes to your online privacy.

Anonymization and its discontents

Google logs an astonishing amount of data, including the search logs from its flagship product. It keeps this data indefinitely, so searching for a combination of yourwife'sname and youraddress and "rat poison in her cereal" is not a particularly smart idea (though search users do this sort of thing anyway).

But the company does "anonymize" this data eventually. The last octet of the IP address is wiped after nine months, which means there are 254 possibilities for the IP address in question (.0 and .255 are reserved addresses). After 18 months, Google anonymizes the unique cookie data stored in these logs.

This isn't especially ambitious; Europe's data protection supervisors have called for IP anonymization after six months and competing search engines like Bing do just that (and Bing removes the entire IP address, not just the last octet). Yahoo scrubs its data after 90 days.

But Whitten, who was involved in Google's decisions on such issues, said that Google has done the best it can to keep the retention period to a minimum while still extracting maximum value from that data … and that this "value" isn't just to Google but also to users.

"Wonderful things that can be done with an abundance of data," she said. When Google's teams began looking at the data retention issue a few years back, they "started with zero" and tried to see if they could make it work. They could not; Google would lose the ability to do too many useful things.

Search data is mined to "learn from the good guys," in Google's parlance, by watching how users correct their own spelling mistakes, how they write in their native language, and what sites they visit after searches. That information has been crucial to Google's famously algorithm-driven approach to problems like spell check, machine language translation, and improving its main search engine. Without the algorithms, Google Translate wouldn't be able to support less-used languages like Catalan and Welsh.

Data is also mined to watch how the "bad guys" run link farms and other Web irritants so that Google can take countermeasures.

Google eventually settled on anonymizing the IP address after nine months, though even here, "we believe that we have lost the ability to do things," said Whitten.

Web users don't mind being tracked?

Instead of cutting the data retention period further, Google is more focused on 1) transparency and 2) keeping the data locked down safely. The company believes that when users know what Google keeps and why it keeps it—and when they have the chance to opt out—users are often happy to let Google do its thing.

Wong points to behavioral advertising, which Google jumped into last year. This sort of advertising relies on a vast ad network across many sites, and the ads record a visitor's unique cookie. Google can collate this data on the back end and compile a list of interest categories associated with a particular user cookie; since most users never clean their cookies, this works well as a general ad targeting mechanism.

When Google rolled out the system in March 2009, VP Susan Wojcicki said the things that advertisers always say on such occasions: this is good for consumers. …

Read the complete article

Average: 3.5 (2 votes)