Tips for Using RegEx in Data Studio

  • RegEx in DS Feature Image

Tips for Using RegEx in Data Studio

Regular Expressions (RegEx) are a language of characters and symbols that can be used to match patterns of text. For example, an expression like United (States|Kingdom) matches both “United States” and “United Kingdom”. If you are a user of Google Analytics, regex can more practically be used to create filters or goals that include a set of pages or events. In Google Tag Manager, regex can be used to define trigger conditions or custom variables. Regex is also commonly used in JavaScript and other programming languages to search, match, or extract portions of a text string.

In Google Data Studio, you can use regex to make your filters and calculated fields more efficient and concise. The available regex functions also allow you to clean, organize, and transform your data in ways that otherwise wouldn’t be possible. However, if you are familiar with using regex in GA or GTM, be aware that Data Studio expects a slightly different syntax for certain types of regex patterns. If you haven’t used regex in Data Studio yet, don’t worry, we’re here to give you the ins-and-outs.

1. Case Sensitivity

In Data Studio, you must indicate whether your regular expressions should be treated as case sensitive or case insensitive. Case insensitive means that the case of a regex pattern is ignored. For example, email would match “email”, “Email”, “EMAIL”, or even “EmAiL”.

By default, a regex pattern will be treated as case sensitive. To force Data Studio to ignore the case of your regex, insert (?i) at the beginning of the expression.

For example, use (?i)email to match any all case variations of “email”.

2. Escaping Special Characters

In the language of regular expressions, certain characters have special meaning (i.e. | $ ^ . + * ? ( ) [ ] { } /). If you ever want to use one of these characters literally—treating a period as a period, for example—then you need to “escape” the special character. This is usually done by adding a backslash \ before the character you wish to escape.

In Data Studio, you need to double escape special characters with two backslashes. For example, matching “facebook.com” will require you to use facebook\\.com .

3. Matching Partial Strings

In most cases, Data Studio will attempt to match the full text of the field. This means that using the expression Search will not match a field containing “Organic Search” or “Paid Search”; You would need to use either (Organic|Paid) Search or .*Search.

In general, if you wish to do a partial match, add .* before and after your regex. For example,  .*search.* will match “search” wherever it appears in the field.

The one exception is in filters, where Data Studio gives you the option to select Regex Contains as a match type. This is similar to how GA & GTM treat regex. By selecting Regex Contains here, the partial match is built-in, so you don’t need to surround your regex with any additional characters.

4. Creating Custom Groupings

Data Studio offers a REGEXP_MATCH function that can be used in calculated fields. This function returns true or false if your regex pattern matches the selected field. This is extremely useful for creating custom aggregations based on existing dimensions, including custom groupings of pages, geographic regions, or traffic sources.

For example, if the sales territories of your company are organized by West, Central, and East, you can use REGEXP_MATCH to create these groupings based on provinces or states:

CASE 
WHEN REGEXP_MATCH(Region, "California|Oregon|Washington|Nevada|Idaho|Utah|Arizona|Montana|Alaska|Hawaii") THEN "West"
WHEN REGEXP_MATCH(Region, "Wyoming|Colorado|New Mexico|North Dakota|South Dakota|Nebraska|Kansas|Oklahoma|Texas|Minnesota|Iowa|Missouri|Arkansas|Louisiana|Wisconsin|Illinois|Mississippi") THEN "Central"
WHEN REGEXP_MATCH(Region, "Michigan|Indiana|Kentucky|Tennessee|Alabama|Ohio|Georgia|Florida|South Carolina|North Carolina|Virginia|West Virginia|Delaware|Maryland|New Jersey|Pennsylvania|New York|Connecticut|Rhode Island|Massachusetts|Vermont|New Hampshire|Maine") THEN "East"
ELSE "Other" 
END

5. Cleaning Up Dimension Values

There are also REGEXP_EXTRACT and REGEXP_REPLACE functions in Data Studio, which can be useful for cleaning up existing dimension values in calculated fields.

Regex Extract allows you to retrieve a portion of a field based on a regex pattern. If you have custom query parameters in your page URLs, you can use a regex extract to retrieve the value of a specific parameter. For example, the Google Merchandise Store has a “myaccount” section in which the pages have a “mode” parameter in the URL. Creating a calculated field with REGEXP_EXTRACT(Page, 'mode=([^&]+)') will return the value of the mode parameter.

Regex Replace will substitute all occurrences that match the provided regex pattern. If you have fragmented data due to inconsistent naming conventions (of traffic sources, for instance), you can use Regex replace to consolidate all variations of the same value. A calculated field with REGEXP_REPLACE(Source, '(?i)((l|m|lm)\\.)?facebook\\.com', 'facebook') can be used to standardize variations of Facebook referral sources.

If you need help with creating or validating your regular expressions, check out RegExr.com.

If you’re looking for more hands-on Data Studio learning, check out our upcoming in-class course!

Related posts in the Data Studio Series:
1. Tracking Your Data Studio Dashboards in Google Analytics
2. How to Share a Data Studio Report
3. Google Analytics Dashboard Template for Data Studio
4. Calculating Goal Flow Conversion Rate in Data Studio
5. Adding Images to Your Tables in Data Studio
6. Embedding Links in Data Studio
7. Simple Visualizations in Data Studio
8. Custom Channel Groupings in Data Studio
9. Calculating Percent of Total in Data Studio
10. Data Studio Connectors from a Galaxy Far, Far Away
11. Audit Your Google Analytics Implementation with Data Studio

By |2017-12-18T14:59:32-05:00December 14th, 2017|6 Comments

6 Comments

  1. Adil September 22, 2018 at 10:53 am - Reply

    Hi,
    I’m trying to create a custom field in DS [using Google Ads data in Excel] to classify the match types as:
    BMM, Exact, Phrase or Broad
    I’m able to work the formula till BMM, Exact and Broad but when I try to create another WHEN – Searching for “, I get this error: Syntax error: Unexpected “.”. Any help with this? Thanks.

    CASE

    WHEN REGEXP_MATCH(Search keyword, “((?i).*\\+).*”)
    THEN “BMM”

    WHEN REGEXP_MATCH(Search keyword, “((?i).*\\[).*”)
    THEN “Exact”

    WHEN REGEXP_MATCH(Search keyword, “((?i).*\\”).*”)
    THEN “Phrase”

    ELSE “Broad Match”

    END

    • Marc Soares September 27, 2018 at 2:39 pm - Reply

      Hi Adil,

      I think the error is the result of your use of quotes. If your regular expression includes a double-quote (“), then you should wrap your expression with single quotes instead, i.e. ‘((?i).*”).*’

      Marc

  2. vijay September 27, 2018 at 6:16 am - Reply

    Hi

    I have connected my Facebook data on Data Studio using Supemetrics and I am trying to extract only Product Id rather than product id with name followed by it.

    Example:
    1000, xxxx

    I need to extract only 1000 from the field as calcuated metric. Please help out.

    • Marc Soares September 27, 2018 at 2:43 pm - Reply

      Hi Vijay,

      You can try using REGEXP_EXTRACT(Product Id, ‘(\d+),.*’) to extract only the digits preceding the comma.

      Marc

  3. Keith November 22, 2019 at 9:52 pm - Reply

    Scenario below:

    CASE
    WHEN REGEXP_MATCH(Product Category (Enhanced Ecommerce),”.*addon.*”) THEN “1. ADD ON”
    WHEN REGEXP_MATCH(Product Category (Enhanced Ecommerce),”.*reload.*”) THEN “2. RELOAD” ELSE “OTHERS” END

    What I want is that when it doesn’t match .*addon.* or .*reload.*, then do not show at all, instead of “others”

    Is that a function where I can put something like “ELSE Do not show”

    • Marc Soares November 26, 2019 at 3:05 pm - Reply

      Hi Keith,

      The ELSE is not required for the CASE statement. If you leave out the ELSE “OTHERS” and just close the statement with END, this field will be empty when the first two statements are not matched.

      Marc

Leave A Comment