Why You Could Lose ALL Your Google Analytics Data

Scared LadyDespite the effort that Google has invested in communicating this point, it is not universally known that Personally Identifiable Information (PII) is not allowed to be captured in Google Analytics. (If you are not familiar with the term, here is a list of what Google defines to be PII.)

The issue is not that GA is incapable of recording PII, but that this practice is against GA’s Terms of Service. This means you should not do it. Being unaware of this could cost you what you value the most.

[Don’t panic!:
In most cases capturing PII happens by accident. We will show you how to find out if you are collecting personal information, plus give you some guidelines on what to do if you find PII in your data.]

While some people may intentionally store this type of information in GA, in most cases capturing PII happens by accident. By accident? How can this happen?:

  • Your website may have a feature that remembers customers who have previously visited your site. This may be done by adding an ID, such as the username (e.g. “John.Doe”), as a parameter in the URL. When this URL is captured by GA, the visitor’s username gets recorded as well.ga-pii-accident-01
  • Perhaps when your site was being built, your cautious developer included a detailed debug message for certain errors. Personal information such as the user’s name, email address, or possibly even more compromising data was included. For convenience this info was appended to the URL. If this debugging routine was not removed when the site went live, the personal details of your customers could get captured by GA.
    ga-pii-accident-02
  • Maybe you have a “members only” section of your website, which allows you to personalize the experience for each of your customers. One way you do this is by including a personalized welcome message on the page title, such as, “John Doe’s Home Page – Welcome Back!” GA records this page title like any other, thus capturing the full names of your members.
    ga-pii-accident-03
  • What if a third party, even one that is not related at all to your organization, decided to link to your site. Suppose they include a link to a page on your site in an email or newsletter. For their own purposes, perhaps a user’s email address or username gets appended to this link(Some CRM and email systems add this type of information as part of an ID). When the user clicks through to your site, these URL parameters are captured by GA. So by no fault of your own, PII can be captured in your GA account.

These are just a few of the cases we have encountered. As you can see, even if you are not aware of it, you may be capturing PII.

We’ve previously blogged about repeatedly finding PII in GA, which we still do all too frequently.

What could happen, you may ask? (Perhaps you’re not concerned, because after all you are using the free version of GA. What can be lost if you are not paying anyways?)

You can lose ALL YOUR DATA.  

Since there is currently no way to “clean” your existing GA data and remove just the PII instances, the entire view/profile will need to be deleted. And there is no way around this, when PII is found in your account the data MUST be deleted.

[ Go here for some guidelines on what to do if you found PII in your data.]

 

HOW TO FIND OUT IF PII IS BEING COLLECTED

There are a few things you can do to identify that potential PII information is being collected. Here are some ideas:

Sample Queries and Reports

Taking advantage of regular expressions, and considering the most common types of personal information that could be collected (name, username, email, postal code…), you can search for common patterns. You can even save some of these reports, add them to your shortcut list, and visit them on a regular basis.

Places and reports you can include in your search are:

  • “Site Content > All Pages” report (make sure to review Page Title tab)
  • Events actions and labels
  • Custom variables
  • Custom dimensions (for Universal Analytics users)

Creating Custom Alerts

Perhaps a better and more proactive way to proceed is to create “Custom Alerts” (if you have not used them before, you can find them under “Personal Tools & Assets” in the Admin section of GA) .

You can even configure these alerts to notify you by email when a potential piece of personal information is being collected. That way you can take immediate action, adjust your filters, and take appropriate corrective measures to prevent further collection of this data.

ga-pii-alert-filter

Sample GA Custom Alert to Identify Potential PII

 

A View Excluding URL Parameters

A measure that may save at least part of your data is to proactively have one profile where all URL parameters are excluded. This may sound a bit extreme, however if you end up finding some PII in your URL parameters, and you have to delete that data, at least you will have this view to fill some of the blanks left by the data that was lost.

It may be true that Google has not yet publicly enforced or announced repercussions against all accounts that are capturing personal information. This is likely because such an action would require a great amount of effort from Google, given the huge number of GA users (and potentially the great amount that are infringing).

However, if and when this occurs, you don’t want to be the one caught infringing and end up losing everything, do you?

[ Most PII capture happens by accident | How to find out if you are collecting personal information ]

 

YOU FOUND PII IN YOUR DATA. WHAT TO DO NOW?

So you checked your data, ran some reports, and discovered that PII is being collected in your GA account. What do you do now? Here are some suggestions:

  • The first step is to create filters, when possible, to ensure that personal information is no longer captured from this point forward.
  • Create brand new views/profiles to start collecting data without the identified personal information. They can be copies of your old profiles. In case your old profiles need to be deleted, you can use these new ones that are free from PII.
  • Assess the extent of your PII issue by determining for how long your GA account has been collecting personal information. This will provide you with a better idea of the impact and possibly provide insight to address the source of the problem.
  • Backup the data you may need for your reports and analysis from your old profiles. Then you must delete your old views/profiles.
  • Address the issues that are causing PII to be sent to GA. Put checks in place to prevent this from happening in the future.
  • You may need to inquire with the legal council in your organization, in order to make the right decision. You should also be aware that there are privacy implications derived from the information being exposed in such an unsecured way in the page URL (yes, a complete new post could be written on that topic alone…).

If you are not sure how to proceed or you need some advice to address this issue, you can always contact us, or any Google Analytics Certified Partner, to guide and assist you through the process.

BUT, WHY SHOULD I LOSE MY DATA?

If you have read this post up to this point, you may be wondering:  If in so many situations PII can be captured by accident, shouldn’t Google provide an alternative to solve this problem without having to delete all your data?

We agree. So let us ask you the following question:

What features would like to see to prevent capturing PII and to remedy situations in which data collection got out of control? 

If you have any suggestions or questions, please feel free to leave a comment below or contact us.

Share this article!

5 thoughts on “Why You Could Lose ALL Your Google Analytics Data

  1. Hi,
    Several thoughts come to mind that may be too “simple” to work. It would be great if the offending PII data could simply be chosen and completely deleted from your GA data set when it is found, but I suspect that is “impossible” for some complex programming or legal reason.
    I have seen others speaking about using the GA tag manager feature to create filters in common places where PII happens, like an email address appearing in a page name, etc. So why not have GA give you a rigorous set of filter tags already in place (or an option to turn them on) designed to catch PII where it most commonly appears? It may not stop everything, but it would give average Joe User the ability to be proactive about the problem. The potentially offending data could be blocked altogether or quarantined for review and possible deletion without corrupting your main data.
    This problem seems too rampant and the solutions too high level for any common user to be expected to maintain. Some greater built in PII protection, or greater leniency for inadvertent offenses would be good considerations in my opinion.
    I would like to implement PII protection myself, but I am not computer literate enough to realistically accomplish this without some trailblazers to streamline the process. In the meantime, perhaps I’ll simply delete all my GA data once a month and use all my analytics for brief snapshot data?
    Thanks for the article, it was helpful!

    • Hello Josh,

      Glad you found the article helpful and thanks for your feedback!

      Agreed, it would be nice to be able to selectively delete PII data to avoid losing a whole View of data. Automated filters would also be useful and like you’ve said, difficult to capture all variations of PII and may lead to false positives.

      Even if we filter PII from Google Analytics, the root issue is we should not be sending it from our websites because it puts the privacy of our website visitors at risk. Passing PII unencrypted is particularly alarming because anyone using a public computer or network could have their information stolen from their browser history or by a hacker sniffing the network.

      When building a new website or making changes to an existing one, one way to approach this proactively is to ensure proper security measures are taken to protect the personal information of website visitors.

      We discuss this in more depth, under the “Preventing PII Leakage” section in this blog post: http://www.clickinsight.ca/about/blog/leaking-customers-personal-information

Leave a Reply

Your email address will not be published. Required fields are marked *