As the popularity of analytics has grown within professional hockey, so has the volume and availability of data. Although stats have always been a part of the hockey experience—goals, assists, save percentage, etc.—in recent years we have seen the introduction of play-by-play tracking that produces a log of every on-ice event. Analogous to “clickstream” data from a website, this produces a stream of every shot, save, goal, hit, and faceoff.
This abundance of data presents some interesting opportunities for reporting, visualization, and analysis. Of course, it requires some work to extract, clean, and present the data in a useful form. As analysts, we enjoy making sense of complex data and telling a story through visualization. We decided that building interactive “shot maps” would be a worthy challenge.
The National Hockey League’s play-by-play event stream contains the on-ice coordinates for every shot, including recorded shots, missed shots, blocked shots, and goals. Plotting all the shots for a player or team can provide insight into their most productive (or least productive) shooting locations. Although we came across several different shot maps like this from other sources, each had its own drawbacks, from requiring manual data refreshes to providing limited visualization options. We thought we could make improvements in both the efficiency of data processing and the effectiveness of the visualization.
Our tools of choice were Google Apps Script for extracting and transforming the data, BigQuery for storage, and Data Studio for visualization.
If you’re just here for the shot maps, you can view the final viz in Data Studio here. If you’re interested in how we built this interactive report, continue reading!
In the rest of this post, we outline our process from API extract to custom visualization along with some of the challenges we encountered and decisions made along the way.
Selecting the Data
Although the NHL’s API is publicly-accessible, there is no official documentation. Based on the unofficial NHL Stats API Documentation, we determined the best source for retrieving shot data was the live game feed. The live game feed returns a stream of all events for a single game, including game starts, stoppages, shots, goals, penalties, and hits. In particular, each shot event has a corresponding on-ice location recorded as an (x,y) coordinate—perfect for mapping!
Here is an example of the game feed from the first game of the 2019-20 regular season: https://statsapi.web.nhl.com/api/v1/game/2019020001/feed/live. The feed starts with game and team information followed by a
liveData object, in which all of the game events are listed in the
Since the game feed returns one game at a time based on a game ID, we use the schedule endpoint to list all games played on a given day. We then retrieve the game feed for each game on the schedule. This allows us to process daily updates throughout the course of a season.
Retrieving the Data
The first part of our Apps Script project makes a request to the game feed endpoint. The only required input for this endpoint is a unique game ID, which identifies the season, game type, and game number, as detailed in the documentation. The API request itself is executed with Apps Script’s UrlFetchApp function. For example:
var response = UrlFetchApp.fetch("https://statsapi.web.nhl.com/api/v1/game/2019020001/feed/live");
We similarly make a request to the schedule endpoint to retrieve the list of game IDs from the previous day. Our script loops through the game IDs and retrieves the event feed for each game.
Parsing the Data
JSON.parse() functions to convert the response into a JSON object:
var json = JSON.parse(response.getContentText());
We can now access the event stream array using
json.liveData.plays.allPlays. Since the
allPlays array contains all in-game events, we need to extract only the events that are useful for a shot map visualization, i.e. shots and shot attempts.
Below is an example of a “shot” event object from the
allPlays array. The type of event is indicated by the
result.event value. In order to include all shots and shot attempts, we filter for
result.event equal to “Goal”, “Shot”, “Missed Shot”, or “Blocked Shot”.
The event object includes many parameters that describe the event. We selected the fields that would be most useful to our visualization, such as
coordinates. Our script then iterates through the array of shot events, extracts our selected fields, and constructs a tabular dataset.
Here’s an excerpt of what our final dataset looks like:
Storing the Data
With 1,271 games in a regular season and over 100 entries per game, the dataset would likely surpass 130,000 rows over a full season. In addition to storing data for the current season, we wanted to include data from playoffs, past seasons, and future seasons. For scalability, we decided to load the data into BigQuery through the BigQuery Service in Apps Script. Storing data in BigQuery also allows us to use the built-in connector to Google Data Studio for visualization.
Automating the Process
One of our objectives for this project was to build an automated process, so as to avoid the need to manually extract and load data on a regular basis. This is where Apps Script really shines, since automation is as simple as scheduling the script project to run on a trigger. We configured a time-based trigger that executes our script daily, updating the dataset in BigQuery with game data from the previous day.
Visualizing the Data
We used the standard BigQuery connector to bring our data into Data Studio. From there, we needed to decide on the best way to visualize the data. Since our vision was to map shot location using the recorded on-ice coordinates, the obvious starting point was a scatterplot. The standard scatterplot in Data Studio however has some limitations. Most significantly, you can plot a maximum of only 1000 data points, you cannot fully control the size of points, and there is no ability to set point opacity. With our dataset, the scatterplot could show only a subset of all recorded shots, and even then the overlapping points obscured the true distribution.
What we really wanted was a heatmap-style chart that would highlight locations with a higher density of shots. While this type of chart is not a built-in option, Data Studio’s community visualizations feature allows you to build essentially any custom chart type that you want. Drawing on our experience developing custom visualizations for Data Studio, we built a 2-dimensional density heatmap chart using D3.js.
Here’s a side-by-side comparison of the same data plotted on a standard scatterplot and our custom density chart:
After much iteration and refinement, we realized our vision for an automatically-updated interactive shot map visualization. The final tool shown below allows you to filter by game type (regular season or playoffs), team, player, and shot outcome (shot, goal, missed, or blocked). You can also filter by shot type by selecting individual bars in the bar chart. The dataset contains the entire (abbreviated) 2019-20 regular season and will be automatically updated daily through to the end of the playoffs. Click to view the full interactive version in Data Studio.
Even if you are not a hockey fan, we hope that you can appreciate the underlying process demonstrated here: automating data extraction with Apps Script, storing data in BigQuery, and visualizing in Data Studio. We hope this project opens your mind to the potential of these tools and inspires you to pursue your own creative applications!