More Precise Long Tail Keyword Analysis: Dealing with Sampling in Google Analytics Reports

More Precise Long Tail Keyword Analysis: Dealing with Sampling in Google Analytics Reports

If you’re actively slicing and dicing your Google Analytics data and using Advanced Segments to focus on your most valuable visitors, you’ve likely seen a yellow box pop up in the top right corner that looks like this.

Google Analytics Sampling MessageAnd looking down your columns of data, you may have noticed some rather strange number patterns, especially in your keyword reports.

Google Analytics Keyword Report, with Sampling

Notice in this image from a Google Analytics organic keyword report that some rows have identical numbers – what are the chances of this happening?

Well, this isn’t a fluky phenomenon – this Google Analytics ad-hoc report, created using an Advanced Segment, is presenting you with sampled data.

Unless you have a lot of data, your Google Analytics standard reports will not be sampled.

But once you create an ad-hoc report by adding an Advanced Segment, Google samples data if more than 250,000 visits are being analyzed for the date range you’ve selected.  In the above example:

  • For the date range analyzed, there were close to 5,000,000 visits.
  • 250,000 is about 5% of 5,000,000.  Hence, the yellow box tells us that only 5.15% of all visits were used (i.e. sampled) to build this report.
  • Once Google’s used 5.15% of the visits to build the report, it then scales the numbers up by multiplying by 1/5.15% or a factor of 19.4.
  • The process of scaling the numbers up causes some rows to have the same number.

You can increase report precision to 500,000 visits by clicking on the Sampling Icon and dragging the slider to the right.

Increase Precision in Google Analytics Sampled Reports

However, in the example above, this raises the sampling to slightly over 10%, meaning that only 1 in 10 visits is being used to create the report.

But if you’re mining the long tail to find new high impact keyword phrases, 10% isn’t precise enough to be useful.  You could be missing some very profitable keywords because of sampling.

What are some tactics to get around this?

  1. Reduce your date range and analyze a smaller date range, multiple times.
  2. Extract into a spreadsheet and filter there.

Both are tedious and time-consuming, and I’ve known analysts to crash Excel trying the extract method.

Analysts’ time is better spent analyzing, not pulling data.

So here’s an alternative we suggest trying.  Create separate organic search profiles with a narrow enough dataset that you can use the standard reports for each of these profiles.  Sampling occurs at the Web Property level, not the profile level, so very high traffic sites may find this tactic does not work for them.  > Read the GA Help Center article on sampling

Remember that profiles are not retroactive. They only start collecting data from the day you create them.  Therefore, before you jump into creating profiles, do a bit of analysis and think about the type of actions you’ll be taking as a result of keyword analysis :

  • Identify Negative Keywords: Long tail keyword analysis is typically done for the purpose of finding new, generic (unbranded) keyword phrases that drive highly engaged prospects or conversions who do not already know about your product/service or organization.  You should also know what does not work or is off target.

Action: Make a list of the keyword phrases and spelling variations that do not work, i.e., your Negative Keywords.  For example, if you are targeting buyers, you may want to exclude job hunters.  Your negative keywords might be jobs, career, opportunities, vacancies, positions, and various job titles.

Action: Remember to exclude this keyword phrase … not provided.

  • Identify Branded Keywords: If you’re looking for keywords that drive new prospects, you’ll want to exclude any keywords that indicate the visitor already knows your organization.

Action: Make a list of all variations of your company name, product name and possibly people in your company.

  • Identify Target Geography:  Are your most viable prospects from a limited number of states and provinces?  How will you take action?  How are your campaigns organized?

Action: Make a list of the countries, states and provinces of most importance to you.  If only Ontario, Ohio and New York State are important to you, focus on these 3 regions only.

There may be other dimensions that make sense for you – Mobile-only or just tablets and larger devices perhaps? Just new visits?  For a full list of available dimensions, browse this list of filter dimensions.

Whatever the combination of keywords, geography and other dimensions you are most interested in, next create* inclusion and exclusion filters for your new profiles**.  At minimum, you’ll need:

  • Include filter: Campaign Medium is organic
  • Exclude filter: Campaign Term (the parameter for keywords) excludes not provided
  • Exclude filter: Campaign Term excluding a regular expression (RegEx) combination of your branded keywords
  • Exclude filter: Campaign Term excludes regular expressions of your negative keywords.
    • You may want to use multiple exclude filters to keep things organized, such as separate filters to exclude job hunters, internal names, educational, etc, etc.
  • Include filter: Either Country, City or Region.
    Warning: You can only use one dimension for location and cannot mix & match because this is an include filter.  So if you want all of Canada plus New York State, you’ll have to use the Region dimension and use a one Regex that includes all the provinces/territories and New York.

Try it out.  Using this new profile, can you now analyze your long tail keywords just using the standard reports, without triggering sampling?  If you find you still have a need to create Advanced Segments, analyze why, and consider creating an even narrow profile.  The great thing is that you can copy a profile instead of building from scratch and just add or swap out the filter that’s too general.

We’d love to hear if this tactic is useful and works for you to lessen the pain of sampling and give you more precise and useful data about your long-tail keywords.


* Note:  With the new enhanced admin interface and enhanced user access controls, Google is in the process of removing the word ‘profile’ from its online documentation, replacing ‘profiles‘ with ‘views‘.

** Time-Saver Tip: If you want the same goals and other filters (e.g. exclude internal traffic) in your new profile,  copy your existing Google Analytics profile and then apply the new filters you have created to narrow the profile.  This not only saves time but reduces the chance of errors.

By |2019-05-17T11:57:18-04:00July 9th, 2013|4 Comments
Categories: Google Analytics


  1. searchengineman July 14, 2013 at 2:47 pm - Reply

    Does the Paid Version of Google Analytics- Remove Sampling?

    • June Li July 15, 2013 at 8:06 am - Reply

      Hi Stanley,
      That’s correct. One of the benefits of Google Premium is removal of sampling.

  2. Fred November 5, 2013 at 11:25 am - Reply

    Thanks you for these very useful tips! I will definitely try them. Sampling is a problem is some cases, for websites with really big traffic volumes, or here when we try to analyse long tail. I like this article that reveals the “myth of virtuous sampling”, you may like it too:

    • June Li November 7, 2013 at 9:00 am - Reply

      Thanks Fred for the link to AT Internet’s article.

      The degree of sampling should be adjustable by the site owner, and the site owner should have access to the unsampled data, although it may be laborious to extract from the system.

      With Google Analytics, this is the case. Unless your total volume rate is ridiculously high, you will be able to access unsampled data, but it could take a lot of time and effort. Tools like Analytics Canvas and Next Analytics can help you extract unsampled data in a more automated way.

      If you have any additional tips to share, we’d love to hear them.

Leave A Comment