June 30 2023

Google Analytics 4 (GA4) vs Universal Analytics (UA): Data Discrepancies

Google Analytics 4 (GA4) holds great potential for unlocking valuable insights, but it also presents unique challenges when it comes to data accuracy. If you’re struggling to reconcile your GA4 data sources, it’s worth addressing those concerns.

First things first, let’s investigate the key changes and challenges that come with transitioning to GA4. GA4 is the latest version of Google Analytics, and it’s designed to be more privacy-focused and future-proof than its predecessor, Universal Analytics (UA). However, there are a few factors that can lead to discrepancies in the data, which have raised concerns and decreased confidence in different companies. Let’s explore the fundamental reasons behind these discrepancies:

Sessions Estimation: In GA4, sessions are estimated both in the user interface (UI) and the API. This estimation is in place to reduce processing effort, especially for larger datasets. So, the numbers may not match exactly, especially for high-traffic properties. However, this estimation should generally have an adjustment of around 2% or better.

Sampling: GA4 uses a different data sampling method compared to UA. In UA, not all the collected data is included in the reports because it’s sampled at the session level. In GA4, however, the data is pre-processed, resulting in unsampled data in the standard reports.

Thresholding: Google applies a mechanism called thresholding to protect user privacy. It withholds data in situations where the algorithm detects a potential risk of identifying an individual by their demographics or interests. This thresholding feature is typically activated when the user count is low or over a short data range, and the threshold limit falls within the range of 35–40 users or events count.

Event Tracking: GA4 tracks events at an individual level, which is different from UA’s session-based tracking. This means that some events may be tracked differently in GA4 compared to UA. For instance, an event may be counted multiple times within a session in UA, but it will only be counted once in GA4. Additionally, GA4 counts a session in a period if there were any events from the session in that period, whereas UA counted a session in the time period when it started. As a result, comparing monthly or yearly sessions between GA4 and UA might not match up.

Time Zone Tracking: GA4 uses the time zone of the user’s device to track events, while UA relies on the server’s time zone. This can lead to differences in event timestamps between GA4 and UA. For example, if a user visits your website from a different time zone, the events they trigger will be recorded in their device’s time zone, not your server’s time zone.

Data Privacy: GA4 has an updated privacy model from UA, which can affect the amount of data collected. In UA, you could collect data from users who had opted out of tracking by using the “Enhanced Measurement” feature. This feature is not available in GA4. If you want to collect data from users who have opted out of tracking, you’ll need to use Google Signals.

Google Signals: Google Signals is a feature that deduplicates user counts. However, this can cause variations in user counts and event counts per user between the UI and API/BigQuery data, introducing discrepancies in data reporting. The aim of the feature is to simplify remarketing list creation and provide demographic information accessibility.

At Mediaworks, we’ve implemented effective strategies to accommodate GA4 within our processes. By leveraging the power of the GA4 API, we ensure the highest level of accuracy in reporting while minimizing the impact of pitfalls linked to sampling and thresholding.

Firstly, we alternate the date range in our analysis to capture a more comprehensive view of the data. This approach helps us mitigate any potential distortions caused by specific time periods.

Additionally, we implement a rolling three-day data refresh to account for late hits, which can occur within a 72-hour window after the event takes place. You can learn more about late hits and their implications in this helpful resource: https://support.google.com/analytics/answer/9964640?hl=en

Furthermore, we optimise our data aggregation process within our data warehouse, ensuring that we consolidate and process the information accurately and efficiently.

Despite these efforts, it’s important to acknowledge that discrepancies may still arise between the user interface (UI) and the data presented in some reports. The variations can depend on factors such as the selected date range, chosen metrics and dimensions, as well as the level of thresholding applied.

We strive to provide you with the most reliable insights possible. If you encounter any discrepancies or have further questions, our team is here to assist you. Get in touch if you have any questions.

You may also like …

November 15 2024

The Rise of Flexibility and Convenience for a New Generation of Automotive

As younger generations prioritise flexibility and sustainability over traditional car ownership, the automotive industry is being reshaped by new models of access and convenience.
October 29 2024

Maximising Peak Performance: Black Friday and Beyond

In the rapidly evolving landscape of digital marketing, adopting platform AI and automation solutions is increasingly essential for businesses aiming to stand out during peak periods.