5 Calculated Fields for Google Data Studio

  • Calculated Metrics

5 Calculated Fields for Google Data Studio

Calculated fields are one of the most powerful features of Google Data Studio. Based on the existing fields in your data source, calculated fields allow you to create new dimensions and metrics to augment your reporting and analysis.

Data Studio offers a variety of formulas that can be used to calculate ratios and rates, extract and reformat text, and apply logical comparisons.

Here are 5 useful examples to get you started with calculated fields in Data Studio:

Tip: Copy & paste any of the formulas below into the calculated field editor in Data Studio. Even though the editor is single-line, you can paste in multi-line expressions as well.

1. Combine hostname and page path

If you have multiple domains or subdomains within your website, it can be helpful to see the full URL of the page in your reports. A simple concatenation function will let you combine the Hostname and Page dimensions:

CONCAT(Hostname, Page)

2. Extract query parameter from URL

Query parameters are often used to trigger dynamic content, label traffic sources, or identify unique users. If you haven’t already moved these parameters into custom dimensions via your analytics setup, you can create a new dimension in Data Studio with a regular expression extract:

REGEXP_EXTRACT(Page, 'mode=([^&]+)')

Replace mode with the name of the desired query parameter.

3. Clean up inconsistent campaign naming

Ideally, you should establish consistent campaign tagging across your organization. Using common naming conventions between marketing teams and agency partners will make the reporting and analysis process much more efficient. However, if you need to clean up historical data or a rogue campaign, Data Studio will be your friend:

The most common inconsistency is usually the use of different case, e.g. “email”, “Email”, and “EMAIL”. This is easily fixed by forcing all campaign dimensions to lowercase.

LOWER(Medium)

Similarly, you can use LOWER(Campaign) and LOWER (Source).

When you use your calculated dimension, Data Studio will then aggregate the inconsistent values for you.

4. Create custom geographic regions

Google Analytics provides you with the standard geo dimensions of City, Region, and Country. But what if your organization uses your own regional grouping? Perhaps your sales team splits the country into West, Central, and East, for example.

Here’s a conditional function that you can use to define your own custom regions, based on the standard dimensions:

CASE 
WHEN REGEXP_MATCH(Region, "British Columbia|Alberta|Saskatchewan|Manitoba") THEN "West" 
WHEN REGEXP_MATCH(Region, "Ontario|Quebec") THEN "Central"
WHEN REGEXP_MATCH(Region, "Nova Scotia|New Brunswick|Prince Edward Island|Newfoundland and Labrador") THEN "East"
ELSE "Other" 
END

Note that Data Studio doesn’t support IF statements, so you must use the CASE function instead (which ends up being cleaner than nested IFs).

The general format is: CASE WHEN X THEN Y1 ELSE Y2 END, where X is a boolean expression and Y1 and Y2 are text, number, or boolean values. Refer to this CASE statement article in the Data Studio help centre for more details.

5. Create a custom channel grouping

Any custom channel groupings that you create in Google Analytics won’t be available in Data Studio. However, you can recreate your channel definitions in Data Studio using a CASE construct with a series of WHEN clauses.

Here is the Google Analytics default channel grouping represented as a CASE statement. Use this as a base for creating your own custom channel definitions:

CASE 
WHEN ((Source="direct" AND Medium="(not set)") OR Medium="(none)") THEN "Direct" 
WHEN Medium="organic" THEN "Organic Search" 
WHEN (Social Source Referral="Yes" OR REGEXP_MATCH(Medium,"^(social|social-network|social-media|sm|social network|social media)$")) THEN "Social" 
WHEN Medium="email" THEN "Email" 
WHEN Medium="affiliate" THEN "Affiliates" 
WHEN Medium="referral" THEN "Referral" 
WHEN (REGEXP_MATCH(Medium,"^(cpc|ppc|paidsearch)$") AND Ad Distribution Network!="Content") THEN "Paid Search" 
WHEN REGEXP_MATCH(Medium," ^(cpv|cpa|cpp|content-text)$") THEN "Other Advertising" 
WHEN (REGEXP_MATCH(Medium,"^(display|cpm|banner)$") OR Ad Distribution Network="Content") THEN "Display" 
ELSE "(Other)" 
END

Using calculated fields is a skill that every Data Studio user should master. It will make your data cleaner, make your reporting more relevant, and ultimately make your life easier.

Interested in learning more about Data Studio? Check out 7 Reasons to Use Data Studio with Google Analytics.

Let us know if there are any other specific aspects of Data Studio you would be interested in learning more about!

——

Looking for hands-on training in Data Studio? Come to our 1-day Data Studio course!

Related post: 7 Reasons to Use Data Studio with Google Analytics

 

By |2023-01-26T09:34:23-05:00February 14, 2017|33 Comments
Categories: Data Studio

33 Comments

  1. Skyler 2017-12-05 at 12:07 - Reply

    I am tracking scroll depth (25%, 50%, 75%, 90%) and want to create an aggregate view of total events per URL. I also want to create a % of the total events per URL at each scroll depth interval. For example, I want to know the % of the total events for a URL at each interval. Is this possible in GDS?

    • Marc Soares 2017-12-07 at 15:18 - Reply

      Hi Skyler,

      You can definitely accomplish this, but not with calculated fields. Data Studio will aggregate data based on the dimensions and metrics that you include in your chart. So, if you want to see an aggregate of total scroll depth events per URL, you could create a table with Page and Total Events and filter for scroll events only.

      To get a “percent of total” metric, you can edit a metric in your report and apply the Percent of Total analytical function. Take a look at the help centre page here: https://support.google.com/datastudio/answer/7512288.

      Hope that helps!

  2. Ondra 2018-02-20 at 05:22 - Reply

    Hi, thanks for the article. You have wrong function name in 2. Extract query parameter from URL part it is REGEXP_EXTRACT. Have a nice day

    • Jonathan Iudica 2018-02-20 at 09:12 - Reply

      Hello Ondra,

      Thanks for letting us know! We have updated the post.

      Jonathan

  3. Joan 2018-03-28 at 05:03 - Reply

    Very useful, Marc,

    But I’m trying to filtering/categorize with a custom field with an example like this:


    WHEN REGEXP_MATCH(Campaign, “BLU”) THEN “Colour Blue”
    WHEN REGEXP_MATCH(Campaign, “GRE”) THEN “Colour Green”
    WHEN REGEXP_MATCH(Campaign, “CAR”) THEN “Product Car”
    WHEN REGEXP_MATCH(Campaign, “MOT”) THEN “Product Motorbike”

    And works fine for campaigns with only one “tag”, like MOT or GRE, for example.

    I have also campaigns with more than one tag, for example CAR+GRE, and the REGEXP only returns the first true condition, in this case, CAR. But I’m not able to categorize both, CAR and GRE.

    Does anybody know how to achieve this?

    Many thanks!

    • Marc Soares 2018-03-28 at 09:20 - Reply

      Hi Joan,

      The CASE statement in your calculated field can only return one value. It will return the result of the first WHEN condition that evaluates to true.

      When you are creating custom groupings in this manner, each dimension value (in this case, each campaign) can only be placed in one group. It won’t be possible to have a “CAR+GRE” campaign appear in both the CAR group and the GRE group. This is the same way that channel groupings and content groupings work in Google Analytics, i.e. it’s a one-to-one mapping.

      If you have different types of tags (e.g. Product and Colour), and each campaign can have at most one value of each type (i.e. you could have CAR+GRE and MOT+BLU, but never BLU+GRE or CAR+MOT), then I suggest creating two separate groupings. You can create one calculated field that groups the campaigns based on colour and a second calculated field that groups based on product.

      Hope that helps!

      Marc

  4. Laz 2018-04-11 at 12:51 - Reply

    This is very helpful, however I am wondering how to split comma separated data into metrics?

    For example having columns with following data: cat, dog, horse, bird.

    Each matched text should be aggregated for summary how many times textbook occurred in the dataset.

    Any ideas how to achieve this?

    • Marc Soares 2018-04-11 at 13:39 - Reply

      Hi Laz,

      You can create a conditional count with a CASE statement. For example, if you wanted to count instances of “cat”, you can use the following:

      CASE WHEN REGEXP_MATCH(Dimension, ".*cat.*") THEN 1 ELSE 0 END

      The same could be done to count dog, horse, bird, or any other value in a comma-separated dimension value.

      Hope that helps!

  5. Arron 2018-05-10 at 16:03 - Reply

    Hi – do NOT statements work? Example: I want a rule created that reflects my segment. Source = “paid_search” and Campaign [does not contain] “gmail”. How is a statement like this properly done?

    • Marc Soares 2018-05-14 at 10:19 - Reply

      Hi Arron,

      You can use a NOT operator as follows:

      WHEN REGEXP_MATCH(Source, "Paid Search") AND NOT REGEXP_MATCH(Campaign,"gmail") THEN "Paid Search"

      Hope that works out for you!

  6. Joe 2018-05-20 at 19:22 - Reply

    Hi – I’m wondering if you can help me display a metric in Data Studio for percentage of overall traffic?
    I want to show what my organic sessions from social networks are as a percentage of my total traffic. And then to drill down further and show that for each social network.
    e.g If I had 1000 sessions total in a month, and 60 of those came from organic Facebook, then I want to show that figure (6%). and I can’t work out how to.
    Sounds like it should be quite straightforward, I’m fairly new to Data Studio!
    Cheers!

    • Marc Soares 2018-05-22 at 18:11 - Reply

      Hi Joe. A percent of total metric cannot be created as a calculated field since it is not a row-level calculation. You can however create a Percent of Total calculation in a table. See our other blog post on Calculating Percent of Total. You won’t be able to get a single scorecard metric, but you can get a table with the total and a breakdown.

  7. VAMSIKRISHNA IMMADISETTY 2018-06-11 at 17:05 - Reply

    Hi Soares, can you please throw some light to accomplish the below requirement.

    We wanted to create a metric like [Last 25 days page views]/[Current Month]? I try to get the current month numeric number but failed and I was not able to create the Last 25 days page views. Please guide me the steps to accomplish this request.

    Sincerely,
    Vamsikrishna.

    • Marc Soares 2018-06-13 at 09:58 - Reply

      Hi Vamsikrishna,

      To get the number of Pageviews for a specific date range, I suggest using a Scorecard chart and filtering by date.

      If I understand correctly, you wish to create a calculated field for [Pageviews in the last 25 days] / [Pageviews for the current month]. Unfortunately, this type of aggregation is not supported in Data Studio right now. You would need to pre-process your data so that you have separate metrics for [PVs Last 25 Days] and [PVs for Current Month]. Then you will be able to calculate the ratio in Data Studio.

      Marc

  8. Bruna 2019-06-05 at 13:43 - Reply

    Hello Marc!

    I am trying to have a table like this:
    Channel | Page = ‘/’ (users) | Page =’/Plans’ (users) | Page=’/Cart’ (users) | Page=’Purchased’ (users)

    I was wondering if that is possible with custom metrics. Please help!
    Thank you,
    Bruna

    • Marc Soares 2019-06-05 at 15:57 - Reply

      Hi Bruna,

      You can accomplish this using Data Blending:
      1) Create a separate table for each metric that you want to include. In each table, add Users as the metric and apply a filter for the desired page.
      2) Select all 5 charts, right click and select “Blend data”. In the blended chart, you can then have a column for each metric.

      This is similar to the method described in this post, except you’ll need to use filters instead of segments.

      Marc

  9. Catherine Hogan 2019-07-25 at 01:47 - Reply

    I used the code for custom channel grouping to include only Organic Search, Paid Search, Referral and Direct but when I enter the code and apply it another set of data values appear on my time series as an unnamed dimension. Do you know why this would happen?

    • Marc Soares 2019-07-25 at 09:31 - Reply

      Hi Catherine,

      Did you include an ELSE condition at the end of your CASE statement? If not, any traffic that doesn’t match your WHEN conditions will be undefined. I suggest including the ELSE "(Other)" condition as shown above. If desired, you can then exclude (Other) from your time series.

      Marc

  10. Kate 2019-09-30 at 14:20 - Reply

    I’m trying to get a count of how many “share” events happen on a page.

    I’ve worked out:
    Facebook share = CASE WHEN (Event Category=”addthis” AND Event Action= “facebook”) THEN 1.0 ELSE 0 END

    But when I try to create another field:
    Count Facebook Shares = Sum (Facebook event)

    Adding “Count of facebook shares” to my table with Page URLS- It doesn’t calculate correctly. In fact my whole “Page” Table gets totally out of whack and removes rows somehow. It’s not a filter, it’s a new column; any insights?

    • Marc Soares 2019-09-30 at 18:16 - Reply

      Hi Kate,

      You shouldn’t need to create a new calculated field to sum your Facebook Share metric. Data Studio will automatically total the metric if you set the aggregation type to SUM. Try creating a table with Page and the Facebook Share metric.

      Marc

  11. Chris 2019-10-31 at 18:20 - Reply

    Hi,

    I’m trying to add a new custom field with just REGEXP_EXTRACT(Page, ‘mode=([^&]+)’) to get something working, and it tells me it doesn’t know what Page is. Do I need to enable something? What am I missing

    • Marc Soares 2019-11-01 at 17:23 - Reply

      Hi Chris,

      The formula above assumes that Page is the name of a field in your data source. If you don’t have a “Page” field, then you will need to replace it in the formula with an existing field that you wish to extract from.

      Marc

  12. Fred 2019-12-27 at 14:20 - Reply

    Hi!

    Very good blog! I love your posts.

    Guys, I have a doubt. I currently have two events, with action values ​​of “start form” and “send form”. I would like to create a metric that allows calculating the rate of shipments between users who initiated the form and others who finally sent. How could I do it with a calculated metric and what kind of conditions would I have to use?

    Thank you,

    Fred

  13. Steve W 2020-04-30 at 14:41 - Reply

    Hi Marc,

    I’m trying to replicate the dimension “User-ID Status” within GDS which is unfortunately otherwise only available in GA when creating a segment. I’d hoped to do so with a calculated field. I created a new field with the following CASE statement because all our UserIDs are prefixed with “uid-“, however I only ended up with “Assigned” values. What happens with all the sessions where no UserID is assigned? Why are they not available in GDS as “(not set)”? Do you know of any GDS workaround without having to create more GA segments?

    CASE WHEN CONTAINS_TEXT(UserID, “uid-“) THEN “Assigned” ELSE “Unassigned” END

    • Marc Soares 2020-05-04 at 11:07 - Reply

      Hi Steve,

      Thanks for the question. Whenever you query a custom dimension from Google Analytics, the resulting report only contains rows that have a defined value for that custom dimension. This happens within the GA interface, as well as in external queries through the Reporting API or Data Studio.

      So in this example, your CASE statement is not a problem. The reason you don’t see “Unassigned” is that rows without a UserID value are not in the dataset to begin with.

      One way to deal with this behaviour of Google Analytics is to explicitly send “not set” or another default value in custom dimensions when they don’t have a set value. If you don’t need a row-level report, then a filter or segment that excludes sessions with a UserID value is probably the best approach.

      Hope that helps!
      Marc

      • Steve W 2020-05-11 at 16:51 - Reply

        Thank you Marc for your reply. It’s most helpful to know that this is a consistent issue that applies to all custom dimensions. I have considered the idea of sending a “not set” value, however we have a few questions/concerns:

        (a) Currently, we only send a UserID value when a user completes a successful login. This means we’d need to figure out when it would be appropriate to reliably send a “not set” value.

        (b) I’m not entirely sure what happens if a user has visited over many sessions over a long period of time without an account (i.e. sending “not set” to GA) and then the user creates an account on our website and logs in (i.e. we now send a UserID value to GA). Since we have session unification enabled, will that mean that all those sessions with a value of “not set” will now be “overwritten” with the new UserID?

        (c) Similarly, for users who clear their cookies, if we initially send “not set” to GA, will this value be replaced by the accurate UserID value the next time they login.

        (d) In cases where a previously known user returns to our website (i.e. we’ve previously sent a UserID value to GA), but they choose to not login, I don’t want to overwrite that user’s valid UserID stored at GA with a “not set” value.

        About the only way I can see to overcome the above would be to set a more persistent cookie with the UserID value, independent of the user’s login status. That way, regardless of whether a user was actually logged in or not, we’d have a UserID value we could send to GA. However, for this approach, we’d need to run this by our Privacy colleagues.

        In general, I’m able to use segments to either include or exclude traffic from known users. But each segment operates independently, so while I can hack it so that it looks like I have a table with both, I can’t calculate a percentage of total. So I can’t create a simple table in GDS that replicates the “User-ID Status” values otherwise only available in GA segments. A simple table that shows:

        User-ID Status | Sessions | % Total Sessions
        —-
        Assigned | 2500 | 16.7%
        Unassigned | 12500 | 83.3%
        It also means that I can’t build a pie chart of all traffic, broken down by this User-ID Status.

      • Steve W 2020-05-11 at 17:01 - Reply

        And I forgot to mention the most important factor:
        (e) The value sent to GA as the UserID is used to group all sessions as the same “user”. If we send simply “not set”, then all activity for every user who doesn’t login will be grouped together, somewhat compromising the concept of a “user”.

        The only idea I can think of to overcome the above point would be to craft unique variations of “not set”, e.g. by appending the GA Client ID. This would still allow a GDS custom field to roll these sessions into an “Unassigned” status, and it still allows us to create segments in GA for all sessions that otherwise would have been equivalent to an User-ID Status of Unassigned.

        • Marc Soares 2020-05-26 at 17:00 - Reply

          Hi Steve,

          Yes, these are common challenges, and as you’ve described very well, it’s difficult to resolve them easily with GA alone.

          It would be feasible to persist a user ID in a cookie, even when the user isn’t logged in. Of course, this has privacy implications as you mention. Session unification only stitches together prior hits in the same session; It will not attach prior sessions to the user ID.

          It would also be feasible to use the Client ID to create a proxy user ID when the user isn’t authenticated.

          Without making any tracking changes, I think the best way to create the chart you’re looking for is to pre-process the data before connecting to Data Studio. You could use Google Sheets (with the GA Spreadsheet Add-on) or an ETL tool to extract and summarize your data into the Assigned/Unassigned status table.

          There are rarely perfect solutions in our field. Hopefully this provides you with some direction.

          Marc

  14. Chad 2020-05-07 at 21:33 - Reply

    For some of my data studio dashboards my regional CASE works when i use it as a filter control, but some times it does not. For example, when i want to narrow down the most common document downloads (event labels) for a region – the customer field filter does not work. The rub is, if i create individual filters for Country and Region, the downloads will show… So I can’t understand why my regional custom field case will not work.

    Any help would be appreciated.

    • Marc Soares 2020-05-11 at 11:44 - Reply

      Hi Chad,

      I’m not sure exactly what the issue may be based on your description. When you say that the custom field filter does not work, what are you seeing when the filter is applied? Does the data not change at all, does it get filtered in an unexpected way, or do you see an error?

      Marc

  15. Pooja 2020-05-11 at 11:17 - Reply

    How to calculate % in data studio .
    for eq:
    we have column
    a1 a1%
    1 1/10 * 100 = 10
    2 2/10 * 100 = 20
    3 3/10 * 100 = 30
    4 4/10 * 100 = 40

    after this i have to calculate % difference.
    a1% – b1% .

    • Marc Soares 2020-05-11 at 11:53 - Reply

      Hi Pooja,

      I’m not clear on exactly what you are looking to calculate. If you already have the percentage in a field in your data source, then you can calculate the difference between two percentage fields as a simple subtraction. If however you are applying a “percent of total” or “percent change” calculation in your chart, you will not be able to use these metrics in a calculated field. The in-chart comparison calculations are not available to be used in calculated fields.

      Another way to create a calculation between two percentages is to use scorecards and data blending. We have a blog post about this here: Calculating Ratios with Filtered Metrics in Data Studio.

      Marc

Leave A Comment