I'm curious how the author knows that it "does nothing". It seems that the argum...

jacquesm · on May 28, 2014

Let's turn that around for a bit. The default assumption that I have is that advertising companies that deal in user profiles (such as Google) will collect everything they can about you because this benefits their ability to sell advertising. Google's terms of service states what they capture, in other words no matter what their user interface is telling you their privacy policy (which I consider to be the leading document in cases like these) tells a totally different story and and generalizes to all of google's services, including search (they even use that as the example of what they capture).

The fact that things like cookies are in those logs that they do make (again, according to the privacy policy) makes it trivial to re-construct the data that they ostensibly do not keep. If it is trivial, makes good business sense, enhances the value of the profile and makes more money then you can bet dollars to donuts that unless there are strong statements to the contrary from the company involved that they do not engage in such behaviour that they do.

Privacy policies are generally written in favour of the company writing them and it would be terribly naive to assume that if it could be written more strict but wasn't that this is an accident or oversight. Note how long google fought the EU commission to have any limits set on their permission to retain user data, and how they tried to spin it as a user benefit when they eventually caved in.

So if google re-writes their privacy police to state explicitly that they do not datamine their logs and that the data is used only in a statistical sense and never in a personally identifiable sense then I would agree with you (and I would even believe them), but until they do it is fairly safe to assume that they in fact do use that information.

Of course 'only to enhance your user experience' and never to improve the bottom line for google.

magicalist · on May 28, 2014

That's really just begging the question again, though.

It's also not correct, as their privacy policy doesn't state that they always collect a cookie per log entry, for instance, but that they may. This is an important distinction, because in practice, at least things like doubleclick and analytics requests do not transmit your google account cookie. In a quick test, google fonts and google hosted libraries don't appear to send any cookies at all, though I don't know if that's true under all circumstances.

There's much you could reconstruct from IP addresses and connection patterns if you were sufficiently motivated, but that's a long way from extrapolating from their privacy policy. Regardless, assuming "they can, therefore they will" isn't nearly sufficient here.

jacquesm · on May 28, 2014

At absolutely zero cost to themselves and good PR as a benefit google could re-write their privacy policy if that were the case. Note that they still have not amended their privacy policy to the effect that they indeed anonymize the log files after 9 months, even though they announced that they would do that years ago.

So no, short of going to work for google or an insider coming up with hard proof there is not much to be done there. But with a privacy policy that details what they do log and a strong financial motive I have little doubt that this is an accurate representation of what's happening.

If it isn't then google is free to contradict it.

magicalist · on May 28, 2014

> At absolutely zero cost to themselves and good PR as a benefit google could re-write their privacy policy if that were the case

I just gave you clear examples where that wasn't the case, examples you can verify yourself by inspecting the requests.

> Note that they still have not amended their privacy policy to the effect that they indeed anonymize the log files after 9 months, even though they announced that they would do that years ago.

here: https://support.google.com/accounts/answer/162743

"We anonymize this log data by removing part of the IP address (after 9 months) and cookie information (after 18 months). If you have Web History enabled, this data may also be stored in your Google Account until you delete the record of your search."

> But with a privacy policy that details what they do log...

I don't understand why you're ignoring the very important distinction between of "do log" and logs "may include".

jacquesm · on May 28, 2014

"may include" in a privacy police is newspeak for "we will".

That 'may' is there so that if you read it you get a warm fuzzy feeling because no way would google ever do such a thing, and it allows them to point at it when they're caught doing it saying 'we told you we were doing this all along, see, we gave ourselves just enough leeway there to squeeze through'. Call me jaded, cynical, old for all I care but I have yet to see a big company that did not act in the way I just described when it came to covering their asses while pulling the wool over the eyes of their end users.

> We anonymize this log data by removing part of the IP address (after 9 months)

That's not exactly anonymization is it? You're making it worse. Anonymization is removing all user identifiable information. This is merely stripping some unspecified number of bits of the IP, which more than likely has changed by then so has lost most of its value, and retains the cookie which has more resolution than an IP to begin with.

> and cookie information (after 18 months).

What's the normal lifespan of a google cookie?

More or less than on average 18 months.

chestnut-tree · on May 28, 2014

"We anonymize this log data..."

There's an interesting article from CNET (from 2008) on this topic:

"Debunking Google's log anonymization propaganda"

"Company says it will be reduce the amount of time that it will keep sensitive, identifying log data on its search engine customers. This is little more than snake oil."

http://www.cnet.com/uk/news/debunking-googles-log-anonymizat...

CaptainZapp · on May 28, 2014

Why on earth was this comment down voted?

You lay out the reasoning for your blog post in a succinct and plausible fashion.

One is free to assume, of course, that you're a tinfoil wearing paranoiac and disagree with your reasoning, but down voting such a comment, into which you actually expend the effort to explain your reasoning just seems so wrong, infantile and dumb.

Another trend I unfortunately observed recently where comments around the line of: "Don't link to this (the this being a blog entry by Andrew Sullivan, which is hardly controversial) since [paraphrased] it does not reflect the dogma of the HN politburo.

It's crap like that, which really devalues HN and I think that's a shame.

jacquesm · on May 28, 2014

This sort of thing is par for the course. Especially in articles critical of Google or Apple, less so with articles critical of Microsoft. That's likely both a reflection of the number of people working for those companies that also frequent HN and the dedicated group of fanboys each has.

wfjackson · on May 28, 2014

> If it is trivial, makes good business sense, enhances the value of the profile and makes more money then you can bet dollars to donuts that unless there are strong statements to the contrary from the company involved that they do not engage in such behavior that they do.

Even strong statements don't help, as seen in the Google Apps for Education case where filings in federal court admitting scanning of GAE emails by Google were at odds with public statements denying it. It's an interesting read as it shows the confusing mess of privacy statements versus actual practice, and how hard it is even to figure out which privacy statement applies, as Google started claiming that the consumer privacy statement applied to free student email as well, but it wasn't clear earlier.

http://www.edweek.org/ew/articles/2014/03/13/26google.h33.ht...

Google finally stopped the practice, and it seems they were scanning even Google Apps for Business emails to build profiles to show ads on other sites even if the show ads setting was off.

http://blogs.edweek.org/edweek/marketplacek12/2014/04/google...

pessimizer · on May 28, 2014

>if you already have a certainty that Google is pure evil

This is a straw man. Google may not think that doing something that is legal and common is evil, and why are we having theological discussions anyway?

You will have information on what Google is doing with your information when Google volunteers that information. Unless they deny usage in a legally binding sense, there's no reason why you should expect to see any further information about how that data is used internally. What you do know is:

1) that there will be no public relations consequence for breaking users' trust, because what they do with what their records is not transparent, and

2) that it's legal and common to use any information gathered about you in just about any way.

You can either assume they'll do what you have indicated that you prefer with the data, even if that likely involves leaving money on the table - or you can assume that they will generally do what they're legally obligated to do, and within that, attempt to maximize profits. IMO, the former position involves imagining that a company has a personality, and doesn't want to take advantage of you. It's a false equivalence to assert that the latter (a company legally maximizing its resources for profit) involves a similar leap of imagination.

You're not going to get information either way about what they DO until Google publicly obligates themselves with a policy document, or somebody leaks.

DanBC · on May 28, 2014

> 2) that it's legal and common to use any information gathered about you in just about any way.

That depends what jurisdiction you're in.

Law is what's on the books and how the regulators and courts interpret what's on the books.

It's possible Google's lawyers are saying that some process is compliant with law when the regulators and courts may have a different interpretation.

higherpurpose · on May 28, 2014

It does nothing just like Google's own DNT option in Chrome does nothing to its own web properties. It's almost like Google is begging for this sort of stuff to turn into regulations against them, because clearly they can't be trusted to do the right thing.