The Benefits of Using BigQuery with Google Analytics Data
This post was originally written in May 2017 and was updated in June 2021.
BigQuery is the cloud data warehouse component of Google Cloud Platform. It's one of the top data warehouses on the market and was ranked a Leader in the category by a Forrester Wave report in 2021. Cloud data warehouses provide advantages for tackling large data sets when compared to on-premise servers. They allow data to be accessed in real-time, enabling faster analysis for marketers and analysts. Scaling up to accommodate more storage space is easy and cost-effective, too.
Our Analytics team makes extensive use of BigQuery when working with Google Analytics (GA) data. BigQuery helps us to combine, calculate, and analyze digital metrics in a new way.
Importing Google Analytics Data Into BigQuery
BigQuery is perhaps the most empowering platform in all of web analytics. It combines what analysts know and love about Google Analytics with the power and flexibility of a streaming, event-based analytics platform (think tools like Snowplow, Heap, etc.). All Google Analytics accounts using the latest property type can now integrate natively with BigQuery. This is a big deal, as this connectivity was previously only available to enterprise accounts. Data teams get the tried-and-true Google Analytics data model, with the ability to break it down to the atomic level and build something new with only a little bit of SQL knowledge.
In addition, this hit-based goldmine sits on the cloud-based infrastructure of Google Cloud Platform, an enterprise cloud computing platform with endless applications from hosting to machine learning. Think of BigQuery as one store in an entire mall of possibilities. Once a company's data is imported, it's easy to take it across the hall and do predictive modeling or share it with a CRM.
The Google Analytics-to-BigQuery integration serves three primary purposes:
- Querying raw Google Analytics data;
- Connecting with other first-party data;
- Exporting data for visualization.
Each capability builds off of the last, and each feature further extends all that Google Analytics can do.
Connecting with Additional Data Sources
Google Analytics data is just a portion of the digital marketing ecosystem. Because BigQuery is a fully-fledged data warehouse, organizations can store data from their CRM, commerce platform, Display & Video 360, Google Ads, third-party advertising platforms, Facebook, YouTube, and more. This unlocks the ability to perform powerful analysis and predictive analytics across business data.
As the marketing tech stack becomes more complex, we can look at BigQuery as a hub for this data. More importantly, it can be architected for maximum connectivity. Whether companies are bringing data in or out, BigQuery presents opportunities to:
- Join GA and CRM data to understand traffic sources associated with qualified leads.
- Add spend data for various ad channels to understand the overall cost per user.
- Import back-end commerce data for a clearer picture of product performance.
- Connect GA and other vendor tracking at the page level to audit accuracy.
CRM Data Joins
Joining CRM data to Google Analytics is probably the most common use case for bringing data into BigQuery. Analysts can configure their website to pass in the Google Analytics client ID (CID) into their CRM when a customer record is created. This identifier is unique to every visitor on a website. Once the client ID is in a CRM, it can be exported into BigQuery. The client ID can be used as the "key" from one data source to the other. The result is customer information alongside web analytics and marketing data in BigQuery.
This is all so important because BigQuery allows analysts to import personally identifiable information (PII) into BigQuery, which is not possible in the Google Analytics interface. Want to understand how top customers first found your organization's website? Once data sets have been integrated, analysts can write a BigQuery query to generate a report of which marketing sources produced which customers, combining data from a CRM with the behavioral and acquisition information from Google Analytics. This will also help teams get a better sense of true customer lifetime value (CLV).
Export BigQuery Data To Your Organization's System Of Choice
Perhaps more commonly, organizations can export data from BigQuery to another system. BigQuery has its own API which helps explain this process. With the aforementioned client ID import, we can send data from BigQuery into any data warehouse that will accept a CSV import. For example, we might want to summarize web behavior for each customer in Salesforce. We could easily obtain the original source/medium information from BigQuery and export it to Salesforce.
Beyond just the raw data, bringing in external data can help with one of the most valuable uses of BigQuery, audience generation. Following a detailed data science project, we might identify an audience or "persona" in BigQuery that we'd like to pursue. Connect with external data to either give teams more info to help with creating audience segments or to help us take those audiences and activate outside of BigQuery.
Querying Raw Data
BigQuery is a database, hosted in the cloud. Once connected to Google Analytics, Analytics data is exported, hit by hit, into BigQuery for querying, just like an SQL database. The data that comes into BigQuery is raw, hit-level data.
By comparison, inside the Google Analytics interface, the data analysts are working with is session-based and aggregated. That's fine for tackling simple marketing questions. For example, in Google Analytics it's easy to count the number of sessions that came from a mobile device. But, to count the number of video play events by a particular user, across multiple sessions, would be much more difficult to answer.
The Google Analytics interface is relatively easy to use and has a number of tools to make it easy to perform on-the-fly analysis. In order to keep the interface as fast as possible, there are certain limitations in the ways analysts access data and how much the interface can be customized.
This is where BigQuery shines. Analysts are using the same underlying data as Google Analytics, but don't have the same limitations. Our team has compiled several sample BigQuery data queries for companies to experiment with in our comprehensive Guide to Querying Google Analytics 4 Properties or Firebase Data in BigQuery. Let's look at some of the basic data limitations that BigQuery helps organizations overcome.
Sampling, What Sampling?
One of the most noticeable limitations within the Google Analytics interface is sampling, which will kick in when running a complicated or customized report, or selecting a large date range. When this happens, Google extrapolates the data by counting only some of the data points and modeling the rest. The result is that the data is, in some cases, merely an approximation.
With the power of a petabyte-scale cloud database, there's no need to worry about sampling. Every hit is sent to BigQuery daily and available to query, often returning results in a matter of seconds.
Focusing On Users Instead of Sessions
In the Google Analytics interface, goals and goal funnels are session-based: that means if a person takes multiple visits to complete a task. As Google Analytics 4 and other event-based analytics tools downplay the concept of a session to show different views of attribution, it's important to know BigQuery has always given us the ability to analyze at the user level. The power of SQL allows us to attribute actions to any duration ranging from one hit prior to conversion to as far back as our historical data goes.
This also has a big impact on commerce-focused companies. If analysts use the Enhanced Ecommerce reports inside Google Analytics' interface, they know that those reports are session-based. But in many real-world scenarios, a person might add an item to their cart in one visit and wait to complete the purchase in another visit. BigQuery allows teams to see purchasing behavior from users who take more than one session to pull the trigger.
The best part is these limitless lookback windows and aggregations apply to all scopes and fields, so they even enable the creation of User-based segments without the pesky 90-day limited lookback window applied to the GA interface.
Combining Dimensions
The word limitless also applies to the number of dimensions and metrics in a single query. When dealing with database tables in an enterprise data storage platform like BigQuery, teams can query as many columns as they would like. That means there's no need to worry about interface limits like two dimensions in a standard report or five dimensions in a custom. If your team wants 35 dimensions in a report, you got 'em.
Analysts can also combine scopes that are difficult to analyze in the interface. Ecommerce data, for example, can be combined with user level data to see product interactions by user types.
This ability to mix any dimension and scope only comes with one warning—check your work. Breaking down the GA data model means building it back up can (1) introduce errors or miscalculations or (2) mean that the metrics now have a different definition than what is calculated in the standard data model (I find this latter case to be a benefit. I would much rather have confidence in how I calculate a value than guess how some prescriptive data model calculates it). Just make sure to understand if the metrics are different, and if so, why they are different.
Going Forward By Going Backwards
Google Analytics has a very specific way that it processes data and configuration settings, like Goals and Filters. As data is collected from a company's site or app, GA will apply those settings and store the finished, altered data to be used in reports later on. Because of this, it's not possible to go backward and change data in a Google Analytics account.
However, with BigQuery teams can essentially rewrite history! If an analyst team made a tracking error in the past, and they'd like to filter out or modify data (page paths, events, entire sessions), they can dynamically adjust queries in BigQuery to account for those issues.
The Fun Stuff
Once data makes it to BigQuery, we can easily send it to any other platform in raw form. One common application for exporting data: predictive modeling and advanced analytics. We can, for example, use statistical models to forecast how many purchases we might expect to have in the coming month, estimate how many leads might result from a new email blast, or look at projecting return on ad spend.
Hit-level granularity opens up a world of possibilities from propensity modeling, cluster analysis, user journey mapping, and other advanced analytics solutions. Many data science platforms have direct integrations with BigQuery or teams can use the suite of tools already built into Google Cloud Platform. BigQuery even has a built-in SQL machine learning tool called BigQuery ML. This allows teams to create and deploy several models with just a few lines of SQL.
If your team has data scientists, they can help to leverage BigQuery for this kind of model development and deployment. Our advanced analytics team at Bounteous does this kind of work every day. Check out some of our new Google BigQuery Recipes, ready-made queries that teams can copy and paste to immediately begin exploring data in BigQuery.
Activating Big Query-Calculated Audiences in Other Channels
In addition to identifying the initial audiences, with BigQuery we can also generate dynamic audiences for remarketing, programmatic media, A/B testing, and Google Analytics reporting. There are several ways this is possible.
For example, we might choose to combine our Google Analytics data from BigQuery with email addresses or related emails from a third-party system. We might want to send this list of email addresses to a mailing service like MailChimp, or a marketing automation platform like HubSpot or Marketo. These custom audiences can also be sent to platforms like Facebook.
Using BigQuery Data with Google Marketing Platform
At the same time, we might choose to see these audiences back inside the Google Analytics interface. The easy way to do this would be to approximate the characteristics we are looking for—creating a segment that closely matches, or mimics, the BigQuery audiences that we created. There are more complex ways to do this as well, for example using the measurement protocol or data import to send these client IDs back into Google Analytics and define an audience.
Once an audience has been built inside or imported to Google Analytics, teams can conduct advanced reporting in the interface, or use the audience for remarketing in a platform like Display & Video 360. And with Optimize 360, another tremendously powerful Google Analytics 360 Suite product, teams can actually target these audiences for A/B testing and personalized offers and experiences on an organization's website.
BigQuery Use Cases
Bounteous has been using BigQuery for client data projects for years. Here are a few sample use cases:
When partnering with Domino's Pizza of Canada, we imported customer loyalty and purchase data from offline sources and merged with Google Analytics in BigQuery, then developed purchase propensity models to score audiences for marketing campaigns, driving revenue increases.
With Leading Hotels of the World, we used BigQuery to develop statistical models to understand which travel amenities customers may be interested in, to deliver more personalized booking experiences.
For PBS, we needed a solution to analyze a lot of Google Analytics data, more than 330 million sessions, 800 million pageviews, and 17.5 million video episode plays per year. BigQuery enabled our team to develop machine learning models that analyzed user trends and created audience clusters to inform new content decisions.
Exporting Data for Visualization
BigQuery also has native integrations with the most popular BI tools. This means you can connect tables to Data Studio, Looker, Power BI, Tableau, and other visualization tools with a few clicks.
Along with that simplicity comes the values of unified reporting. If we are joining various data sources to enrich the view of our marketing programs, we can use these sources to drive more intelligent reporting. Many organizations still rely on dashboards where data from one source merely sit next to data from another source. BigQuery gives us the power to join web analytics, ad channel spends, and CRM metrics in one table, enabling lightning-fast insight on metrics like cost per qualified lead or overall campaign ROI.
BigQuery: Cost, Value, & Impact
There's no direct cost for connecting Google Analytics 360 to BigQuery. When an organization starts with Google Analytics 360, they may import the last 13 months or 10 billion hits (whichever is smaller) of Google Analytics data into BigQuery, so that teams can start writing advanced queries for year-over-year reports immediately.
Costs in BigQuery can be a concern for organizations venturing into data warehousing for the first time. But it's important to note that BigQuery is designed as a managed, serverless data warehouse. That means teams pay only for the data and features they use, when they use them. Those costs generally come in two categories: charges for storing data and charges for querying data.
- Storage: Storage costs are pennies per gigabyte. To put that in perspective, a site with roughly 10 million hits per month may see a monthly charge of $3 to store 1 year's worth of data. Well worth the value of hit-level analysis.
- Querying: Querying costs are also relatively trivial for most average users. User pay per terabyte of data queried from the database. Most organizations don't use more than a terabyte of processing in one month—keeping querying costs in the single digits.
For many organizations using BigQuery, that ends up totaling to costs well under $100 per month—and that includes storing year's worth of data from multiple sources driving rich reporting in connected BI tools.
The terms "managed" and "serverless" also create value far beyond cost implications.
BigQuery will scale to meet your organization's needs, churning through terabytes of data in seconds with no need to worry about instances, clusters, or virtual machines. It just happens (managed). There's also no need to worry about securing server space or some catastrophic database failure (serverless). This results in serious time savings, but our team has discovered that it also empowers a whole new class of analysts to use data in BigQuery to discover critical insights. A lower barrier of entry means teams don't need to be skilled database admins to drive real results, they just need to know a little SQL.