Use case #2: Identifying GA3 vs GA4 data discrepancies

Category: Performance | Solution: DataMa Compare, DataMa Pivot | Type : Ad hoc | Client: Web travel | Extension: None
Tags: #Occupancy #Stock #Conversion

Context

The client is a major hospitality player in France. 

With the upcoming switch from GA3 (Universal Analytics) to GA4, the data analytics team has started to transition their web reporting to GA4. In the process they noticed some discrepancies in their main KPIs, such as Sessions, Transactions, and Revenue. However, they needed a way to systematically identify all of these differences between the two versions to see if there were any dimensions that were driving these discrepancies. 

In other words, they needed a quick way to see differences between GA3 and GA4 and possible causes.

Approach

Market equation

Option 1

While a market equation may not seem like the most obvious way to solve this problem, we can imagine a waterfall graph in Compare with GA3 on the left side and GA4 on the right, and each step in the waterfall will be a KPI.

This basic market equation calculates the differences in the main KPIs for GA4 and GA3 and will translate the variations in Revenue.

Option 2

Once we have identified which KPIs have the biggest differences, it is in fact (and for once in DataMa!) more interesting to look at the ratios of GA4 values vs. GA3 values for each problematic metric. The market equation is then just the ratio between the two metrics:

Dataset

Option 1 (for DataMa Compare)

In addition to metrics Sessions, Transactions, and Revenue, we include comparable dimensions, such as Date, Device, or Browser to see if either of these things are contributing to the difference.

In this use case, the data comes from Google Analytics 3 and 4, extracted through the reporting APIs using DataMa Prep. 

We have to create a column in the data to describe if the Source was ‘GA3’ or ‘GA4’. This can be easily done and automated in DataMa Prep:

An example of the extracted dataset can be found here.

Option 2 (for DataMa Pivot & Impact)

For option 2, we just need to unpivot the previous dataset, in order to get two columns of a given metric, one with GA3 data and the other with GA4 data. Luckily, this can be easily done in DataMa Prep with the previous dataset using an unpivot block:

An example of the transformed data can be found here.

Takeaways

From the waterfall graph, we can see that there are differences in each of the KPIs. In DataMa Compare using dataset #1, we can see that the largest difference was in Sessions, as their definition is different from one tool to the other; 

Transactions and revenue seem  to be properly tracked on both tools as they are quite close between the two tools. As a result (and only because sessions are different), the conversion metric is also different. 

 

 

As such, we ended up using dataset #2 in DataMa Pivot to further understand where the differences came in the ratio of sessions, GA4 sessions being -16% lower than GA3.

Using DataMa Pivot, we are able to quickly explore which segments are significantly above or below the average. 

Mousing over each bubble shows specifics about the outliers. In this instance, we can see that the Edge ratio of sessions between GA4 and GA3 is specifically low, which might reflect an issue in the way the data is collected.

Obviously, this use case is pretty limited in terms of dimensions used for analysis, but the more you have the better it is to find the drivers.

Furthermore, we were able to set up an anomaly detection use case in DataMa Impact, in order to monitor how that ratio (Session GA4/ Session GA3) evolves over time, and get notified if it goes out of “normal values”. During the “double tracking period” of 2022-2023, this allows the client to gain confidence in the new platform numbers and to always be able to check data consistency before going into further analysis.

Outcomes

 

Using DataMa Compare connected to GA4 and GA3 through DataMa Prep, the client was able to quickly assess the main differences in their KPIs between the two sources, and understand the reasons for these differences based on segment analysis in DataMa Pivot. They are now able to monitor this gap over time and get notified of any changes in order to find tracking issues or tool changes. 

This analysis layer helped in the difficult process of web analytics tool migration by saving time on analysis and automating monitoring and alerting.