I’ve recently spent a lot of time doing various forms of business analytics in BigQuery. As discussed in a previous post, I’ve been using BigQuery as the data integration environment for several business systems. I’ve found integration at the data level via an ETL/ELT/IPaaS pipeline to be a lot more stable than system-level integrations that involve chaining together dependencies on fairly volatile SaaS APIs.
The original premise was fairly straightforward: Given a table of user-level statistics over time, identify only those points in time where one or more of the statistics changed value. In our case, we had several million rows of user-level data captured on a daily cadence. Manually inspecting this data for changes in individual values by customer was simply not a viable plan. The BigQuery LAG function came to the rescue.