Despite the effort that Google has invested in communicating this point, it is not universally known that Personally Identifiable Information (PII) is not allowed to be captured in Google Analytics. (If you are not familiar with the term, here is a list of what Google defines to be PII.)
The issue is not that GA is incapable of recording PII, but that this practice is against GA’s Terms of Service. This means you should not do it. Being unaware of this could cost you what you value the most.
In most cases capturing PII happens by accident. We will show you how to find out if you are collecting personal information, plus give you some guidelines on what to do if you find PII in your data.]
- Your website may have a feature that remembers customers who have previously visited your site. This may be done by adding an ID, such as the username (e.g. “John.Doe”), as a parameter in the URL. When this URL is captured by GA, the visitor’s username gets recorded as well.
- Perhaps when your site was being built, your cautious developer included a detailed debug message for certain errors. Personal information such as the user’s name, email address, or possibly even more compromising data was included. For convenience this info was appended to the URL. If this debugging routine was not removed when the site went live, the personal details of your customers could get captured by GA.
- Maybe you have a “members only” section of your website, which allows you to personalize the experience for each of your customers. One way you do this is by including a personalized welcome message on the page title, such as, “John Doe’s Home Page – Welcome Back!” GA records this page title like any other, thus capturing the full names of your members.
- What if a third party, even one that is not related at all to your organization, decided to link to your site. Suppose they include a link to a page on your site in an email or newsletter. For their own purposes, perhaps a user’s email address or username gets appended to this link. (Some CRM and email systems add this type of information as part of an ID). When the user clicks through to your site, these URL parameters are captured by GA. So by no fault of your own, PII can be captured in your GA account.
These are just a few of the cases we have encountered. As you can see, even if you are not aware of it, you may be capturing PII.
We’ve previously blogged about repeatedly finding PII in GA, which we still do all too frequently.
What could happen, you may ask? (Perhaps you’re not concerned, because after all you are using the free version of GA. What can be lost if you are not paying anyways?)
You can lose ALL YOUR DATA.
Since there is currently no way to “clean” your existing GA data and remove just the PII instances, the entire view/profile will need to be deleted. And there is no way around this, when PII is found in your account the data MUST be deleted.
[ Go here for some guidelines on what to do if you found PII in your data.]
HOW TO FIND OUT IF PII IS BEING COLLECTED
There are a few things you can do to identify that potential PII information is being collected. Here are some ideas:
Sample Queries and Reports
Taking advantage of regular expressions, and considering the most common types of personal information that could be collected (name, username, email, postal code…), you can search for common patterns. You can even save some of these reports, add them to your shortcut list, and visit them on a regular basis.
Places and reports you can include in your search are:
- “Site Content > All Pages” report (make sure to review Page Title tab)
- Events actions and labels
- Custom variables
- Custom dimensions (for Universal Analytics users)
Creating Custom Alerts
Perhaps a better and more proactive way to proceed is to create “Custom Alerts” (if you have not used them before, you can find them under “Personal Tools & Assets” in the Admin section of GA) .
You can even configure these alerts to notify you by email when a potential piece of personal information is being collected. That way you can take immediate action, adjust your filters, and take appropriate corrective measures to prevent further collection of this data.
A View Excluding URL Parameters
A measure that may save at least part of your data is to proactively have one profile where all URL parameters are excluded. This may sound a bit extreme, however if you end up finding some PII in your URL parameters, and you have to delete that data, at least you will have this view to fill some of the blanks left by the data that was lost.
It may be true that Google has not yet publicly enforced or announced repercussions against all accounts that are capturing personal information. This is likely because such an action would require a great amount of effort from Google, given the huge number of GA users (and potentially the great amount that are infringing).
However, if and when this occurs, you don’t want to be the one caught infringing and end up losing everything, do you?
YOU FOUND PII IN YOUR DATA. WHAT TO DO NOW?
So you checked your data, ran some reports, and discovered that PII is being collected in your GA account. What do you do now? Here are some suggestions:
- The first step is to create filters, when possible, to ensure that personal information is no longer captured from this point forward.
- Create brand new views/profiles to start collecting data without the identified personal information. They can be copies of your old profiles. In case your old profiles need to be deleted, you can use these new ones that are free from PII.
- Assess the extent of your PII issue by determining for how long your GA account has been collecting personal information. This will provide you with a better idea of the impact and possibly provide insight to address the source of the problem.
- Backup the data you may need for your reports and analysis from your old profiles. Then you must delete your old views/profiles.
- Address the issues that are causing PII to be sent to GA. Put checks in place to prevent this from happening in the future.
- You may need to inquire with the legal council in your organization, in order to make the right decision. You should also be aware that there are privacy implications derived from the information being exposed in such an unsecured way in the page URL (yes, a complete new post could be written on that topic alone…).
If you are not sure how to proceed or you need some advice to address this issue, you can always contact us, or any Google Analytics Certified Partner, to guide and assist you through the process.
BUT, WHY SHOULD I LOSE MY DATA?
If you have read this post up to this point, you may be wondering: If in so many situations PII can be captured by accident, shouldn’t Google provide an alternative to solve this problem without having to delete all your data?
We agree. So let us ask you the following question:
What features would like to see to prevent capturing PII and to remedy situations in which data collection got out of control?
If you have any suggestions or questions, please feel free to leave a comment below or contact us.