Personally-Identifiable Information (PII) is information that can be used to identify you as an individual. This includes your name, email address, mailing address, username, phone number, or some combination of these. In other words, if it allows someone else to find out exactly who you are, then it’s PII.
If you have a website, you are likely collecting and storing PII data from your visitors—email addresses to signup for a newsletter, usernames to login, or billing information to make a purchase. In return for doing business with you, your customers are trusting you to safeguard their data.
Despite all your efforts to ensure the security of this sensitive information, are you sure you’re not leaking it into cyberspace? And what if the leaked information is being captured by Google Analytics?
If you use Google Analytics, you should be aware that sending PII, or any other private information like credit card info or passwords, to GA is against the Terms of Service you agreed to abide by just by using Google Analytics. How do you know it isn’t happening right now? What can you do to ensure it doesn’t happen?
[Edit: May 15, 2018 see Google’s recent article: ‘Understanding PII in Google’s contracts and policies‘ for more information on what Google considers PII]
Leaking PII into Google Analytics is rarely done intentionally and can go unnoticed for long periods of time, especially if you aren’t sure where to look or aren’t looking. However, once the PII leak happens, it’s too late. The data has been processed and is permanently in your reports.
You may be thinking, Can’t I just select that portion of my data and delete it?
Nice try! We wish. But at this time, it isn’t possible to selectively delete data within GA. Hence, you will have to delete the contaminated Views (previously known as Profiles) entirely to avoid being in violation of GA’s Terms of Service (See Section 7. Privacy).
So… are you leaking PII without knowing? How can you find out?
Three At-Risk Areas You May Find PII in Google Analytics:
1. On-site Form Submissions
One of the most common cases of PII in Google Analytics is the result of form submissions that pass information via the URL. If the submitted information contains PII, it will be automatically recorded by GA when the page loads.
This happens often in sign-up and registration forms. It could be as seemingly harmless as an email or password capture from a newsletter or login portal. In other cases it may be more severe for example, if an ecommerce site passes credit card numbers or full billing information via the URL.
Aside from GA, passing PII unencrypted via the URL is particularly alarming because anyone using a public computer or network could have their information stolen from their browser history or by a hacker sniffing the network.
2. Site Search / Help Center Traffic
Often people will mistake the site search for a login box and type in their username or password.
Typically both of these sit in the top right hand corner of a webpage. We recommend reassessing your site search from a usability perspective to make sure it isn’t misinterpreted. Consider adjusting the watermark to something like: “search this site” rather than something more ambiguous like a blank space.
Similarly, if you track searches in your Help Center and send to Google Analytics, you would be surprised how often frustrated people will write something like: “my new email address is firstname.lastname@example.org…my old email address was cancelled so how do I login to my account? I cannot remember username or password !“
So, if you have an on-site search tool, beware of collecting names, email addresses, usernames, or passwords in your site search reports. Passwords cannot be used to identify people but are considered private information.
3. Inbound Traffic
Bloggers may use trackbacks/linkbacks/pingbacks to communicate between your website and theirs by requesting that you add their comment to your blog.
Be careful not to accept any content that includes PII. It may be hidden in the URL or Full Referrer string.
In the past, we have seen a website that received trackbacks from another company. This company was emailing their customers and linking to another website’s blog in an email. Somehow, the email addresses from the contact list were being appended to the URL and thus, picked up in Google Analytics as seen below:
Preventing PII Leakage
You may or may not have fallen victim to the above leakages. Regardless, be proactive and limit your risk by continuously monitoring for PII leakage and adjust your testing to detect leakages before launch of your digital assets, whether that’s a website or app. With analytics becoming such a fundamental part of business decisions, can you afford to risk being forced to delete your data?
Make PII data integrity a business requirement during the design and upgrades of your website and/or app(s). Ensure your agencies and vendors adhere to the level of data protection your company has committed to not only to meet Google Analytics Terms of Service but also to meet the level of privacy you promise your customers.
Test for PII before launch.
- Host a team brainstorming session with your developers to determine areas at risk of leaking PII.
- Send new page, form, or feature data to a staging view in Google Analytics first for testing. Look for PII in the pages and events reports, but also in traffic sources, site search, and any custom dimensions. When testing is complete and all issues rectified, be sure to copy over the view if you would like to continue to use it rather than renaming to ensure no leakage.
- Test PII during initial build and upgrades to your website to assess what’s being transmitted to the server in the header using a tool such as Fiddler. Fix the problems before going live. Be as rigorous with testing during upgrades as initial build because upgrades have been known to cause unexpected leaks.
- Test failure scenarios. Purposefully fill in information improperly to generate error responses and take note of the information being sent as a result. Often, error cases can behave differently than valid submissions, so be sure to test them. It’s not uncommon for us to find scenarios like the following:
Schedule ongoing PII Audits. Create a view containing all traffic and no filters. Look for PII in the areas mentioned above to identify leakages and fix minor problems before they become a much bigger issue.
Create a Fallback View. Configure the filters in a designated back-up/fallback View to remove all query strings from URLs and referral domains and to not collect onsite search terms. This will limit the scope of your analysis (thankfully it will not strip out your marketing campaign tagging) but at least you will have one View for high level user, session and page metrics that you can hang on to, even if PII does corrupt other Views, which will have to be deleted.
Here’s how you can delete all query strings from all URLs:
I Found PII. What now?
Alert your technical team to identify the root cause of the problem.
Contact your Privacy Officer or Corporate legal.
Use filters to block existing forms of PII and create new, clean Views. This will not fix the problem. Ultimately, your tech team will need to fix the issue on the back-end to truly safeguard your user data. This is a band aid approach to stop data from entering GA although it is still being sent from your website.
Finding email addresses seems to be a common occurrence. One filter we’ve found particularly handy is a ‘Search and Replace’ filter using this RegEx:
Exclude likely PII offenders. Under each view there is section called: “Exclude URL Query Parameters”. This can be used to exclude the parameters you’ve identified as issues and to remove possible expected PII-related query parameters.
- For example, we may add parameters such as:
The exclusions need to be an exact match (case-sensitive).What you choose to include will depend on the query parameters used by the developers who created or upgraded the website/app.
Download any high level data you want to preserve.
Delete the contaminated Google Analytics Views.
Google also provides some best practices to avoid sending PII to any Google product.
Want to learn more? Contact us to ask your questions directly or post questions/comments below.
*Please Note: This information is not intended to be legal advice. Please speak to your company’s legal team or representative to assess your PII compliance needs.