Collecting GA UserID into GA might violate Google Analytics TOS
Isn't it what Google Analytics already does? Yes but the devil is in the details.
Timeline of facts
Before get to the heart of the matter I found funny how things went since I wrote the article on how to tie activity in our website from an unidentified user to the activity once that person logs in.
It was posted October 22nd. GA team like it enough to praise and promote it via twitter and G+ the next day. I didn't noticed right away till a friend told me it made it to GA website's home page as you can see in this screenshot.
They have deleted their tweet and G+ post. Makes sense.
Then November 5th I received this DM from someone at GA's team, personal account, not GA's one to be clear.
Google works in mysterious ways, brothers and sisters. AMEN!
Innocent me. After the previous compliments I thought it was going to be about something nice but my cultural background played me again. I've learned after 5 years living in the upper part of the New World that when someone is going to scold or threaten you, first they tell how nice you are or how much they like you. It's the "ok sweety this is not going to hurt, ok?" tactic.
Spaniards don't usually beat around the bush. We like more the "nothing personal, just business" approach. Not better, nor worse, just different. I should have interpreted correctly last part of the message. My bad. So, anyway, finally December 3rd I got an email.
[Action requested] re: Tracking GA users' ID blog post
[bcc XYZ] since he mentioned you're a friend of Google Analytics
Some members of our team have reviewed your article here and note that the technique you've described uses HTTP State Management mechanisms. Using HTTP State Management mechanisms to propagate cookie state is a circumvention of our privacy safeguards. Doing so violates the Google Analytics Terms of Service, even without the use of personally identifiable information.
The appropriate way to understand users in the fashion you describe is with the User ID feature and Session Unification. This would enable the analysis of sessions, across devices, using a unique, persistent, and non-personally identifiable ID-string representing a user. This would normally occur as part of a signed-in user experience (when a user signs into your website) but not in any other way.
We'd ask kindly you to amend your post so that it advocates for usage that is inline with User ID guidelines and, in particular does not use HTTP state management. In the event that you are unwilling to change your content, we plan to leave a comment clarifying that use of this technique may cause your Analytics account to be terminated since it is a violation of our terms of service.
We ask that if you're using this technique in any of your existing accounts that you stop immediately.
Thanks so much and we apologize for any issues caused. Our team is also looking into how we can make this more clear on our privacy and terms of service pages.
Please let us know if there are questions.
XYZ, on behalf of the Google Analytics team
Again the same "not going to hurt, sweety" approach. Yes, I make a living dealing with GA mostly and a bunch of my ex-workmates work for GA but I don't friend tools, I befriend people.
How in the world I should have guessed there was something wrong after their very own blessings? And of course there were questions! Many.
The heart of the matter. Discussing the issue
I answered right away saying I was going to amend the post to include a warning note and my opinion. I don't want anyone to blame me, more in particular my clients, for getting their accounts cancelled so, as a preventive measure, I switched off the mechanism wherever it was implemented before talking to the owners of the GA accounts I deal with (and have that enabled) to discuss the situation.
Back and forth several emails that I'll summarize here.
HTTP State Management mechanisms
First time I hear about that concept but in plain English that refers to what we know as cookies. I asked and XYZ answer was:
In the context of your post, State Management Mechanisms refers to your article advocating a setup that uses local browser storage to persist information copied from the GA cookie. As you allude to in your post, a user clearing their cookies isn't respected in that scenario.
True that users are more aware of what cookies are, specially since in Europe you have to include a warning for visitors that your site uses them, and browsers make quite easy to manage them while people are not broadly aware that there are other mechanisms like localStorage to accomplish the same tasks.
In any case, and this is something I have to amend in my article too, "Clear browsing data" options in browsers nowadays take care of what is stored on localStore so if cookies are not a problem I don't see why localStore is.
Where exactly in TOS says so?
Their main concern here is to use techniques propagating cookie state and I wanted to know where exactly in GA Terms Of Service explains that could be a violation. XYZ pointed to end of first paragraph in section 7. Privacy:
You must not circumvent any privacy features (e.g., an opt-out) that are part of the Service.
Again, if you use Clear browsing data options correctly locally stored data coming from GA cookie is gone the same way the cookie is gone firstly.
No more comments
Several attempts to discuss further TOS violation and privacy ended in comments like:
My email request is at the request of our product council (read: lawyer), so as I've said, it's a violation of our terms of service. They're aware of privacy policies on the end users site and our opt out mechanism, neither change that determination.
You're always free to write what you'd like, but if it includes recommendations that violate our terms of service we would leave a comment that it's violating our terms of service and grounds for termination of their Google Analytics account.
Ok, I get it, you are not here to discuss anything at all, just to deliver the "cease and desist" message.
The so called Visitor/User ID
For obvious reasons Google Analytics collects the Visitor ID from the first party cookie set in every visit. That's the glue for the rest of the data collected and what makes it a powerful measurement tool but they don't make it accessible via GA web interface or the reporting API.
With the new Universal Analytics Google came with a way to overwrite their randomly generated user ID for session unification across browsers or devices. Many GA users though (including me) that using that feature would make visitor ID available in reports or API but no, you are not going to get it in any manner.
Yes, there is one. While using Google Analytics Premium and getting its raw data dumped into a BigQuery project makes the visitor ID in the cookie available right away for you without having to implement any other trick.
Take a look at the BigQuery Export Schema to see the first field listed is fullVisitorId The unique visitor ID (also known as client ID). In other words, the exact same information I'm able to capture in a custom dimension, as XYZ confirmed.
You pay, you got it.
Contradictions in their literature
There are more authoritative experts than me to discuss the fine printing so I'll let them have their say but the few ones I've managed to reach for a quick discussion agree that it does not look like a violation of their terms.
Both pointed to this line at the Security and privacy in Universal Analytics page under the First-party cookie storage is minimized in analytics.js section. It reads:
Clearing or deleting cookies from a browser does not ensure that subsequent visits to a website will be considered new sessions in Analytics.
So, even if you delete cookies, GA can recognize you as a previous visitor and it is OK for them to violate their TOS! but not us. Do we or do we not have an agreement?
Even funnier is the fact that you can find their own Justin Cutroni explaining in a post how to extract a unique ID from the Google Analytics __utma cookie, saying it is not personally identifiable information and addressing Tim Wilson's privacy concerns in the comments, let me quote:
I know what you're thinking, "You can't store personally identifiable information in Google Analytics!" But this isn't personally identifiable information. And besides, this is the same as the transaction ID stored in the ecommerce data.
I believe that this type of integration is a completely valid method that lives in a grey area. I believe that the TOS is out of date with the current ecommerce tracking code. But in the end it's up to every organization to evaluate their needs and how to meet them.
But hey, he is not using local storage capabilities! Does that make it 'legal'?
What that technique means in privacy terms
The script offered does not collect any piece of information that could identify a user. In the case of a web where there is an option to sign in, it offers a way to relate pre-sign in user activity to post-sign in one.
- If you sign in it means you voluntarily gave the company some personal information and authorization to relate how you navigate and what you do in their website to you.
- If you never signed up collecting that visitor ID from the cookie, even across different sessions, is completely useless to identify anyone.
I could be wrong but I don't think that violates privacy. Any expert in the room that can corroborate if what I'm saying is correct?
If it doesn't why GA is so concerned with their policy? Especially when they are so inconsistently defending it.
In the mean time Google GA product council retouches their terms here and there to make it explicit and consistent.
If you have a GA Premium account
- Call your GAP rep for a meeting and review the agreement your company signed with them in the case you want to use that technique. That is going to require lawyers and they are not cheap.
- Second option. Open a BigQuery project and ask your GAP rep to start dumping GA raw data to BQ. It is ridiculously inexpensive, and gives you all the data you are looking for to analyze "signed in" and "pre-signed in" sessions for users who agreed at the moment of sign up with your privacy statement.
If you don't pay for Premium
- Use that technique only if your visitors can log in at any given moment of their visit.
- If they can't it adds nothing relevant to your data as explained previously. Avoid data hoarding.
- Use Piwik alongside with GA for Flying Spaghetti Monster's sake!
Worst case scenario, if they catch you ‘cheating' your GA account won't be terminated right away. "We wouldn't ever take action without first notifying you" XYZ said.
Again, you are not collecting anything that could identify anyone and if you can it is because they previously gave you consent but up to you, They asked me to warn you so I did.
Final note on how GA handles issues with clients
By 'clients' I mean the ones that pay. When you use a free service like the regular Google Analytics one you have no power at all to argue. Take it or leave it. That's it.
It is the first time for me dealing with GA TOS potential violations and I have no idea how they handle that but there is something I totally dislike from this experience. Let me elaborate.
The last thing I want is any of my clients suffering any ugly consequence due to something I implemented for a measurement problem they asked me to solve so when the email reached my inbox the first thing coming to my mind was "better be transparent" upfront with all parties before that escalates.
I told that person speaking on behalf of the Google Analytics team:
- The script was already switched off by the time I was answering back 30 minutes later
- I had implemented it at one of my clients' sites, naming the company not the GA property
- My relationship with that company is contractual, I'm not an employee
- In my opinion they had the moral obligation to contact that company, now they knew, to give proper explanations firsthand
You might think I'm an idiot for mentioning the company but I have gained the confidence of my clientele across years by being totally honest and in that context that is why I told him the last point was important.
The answer was "If this method is no longer in use on their site, then we have no further concerns" read: "nope, I'm not going to bother a big company that spends quite a chunk of money in Google services just for that" what I totally find an incorrect attitude. it shows no respect for the client, their client, the one that signed a contract with them and pays them for a service. The obligation is towards them, not me.
For the record, it's just business.
Even if you advised a third party like me knowing that the message will be delivered you should curate the relationship with those you make business with and talk to them. Maybe I'm too old school but that is how it goes here.
If there is any development worth sharing I'll update this post. Your opinions are welcome. I have amended one article already, I won't have no problem fixing this one if required.
Pic of hedgehog at the top by Narisa