Web to Unified
This migration guide is for the Snowplow Web (legacy dbt package) users who would like to upgrade to the Snowplow Unified Digital package which is our most supported and feature-rich dbt package you can use to model web and/or mobile event data.
Simplified Upgrade Overview
Breaking changes between the packages
- the
page_views
table has been renamed toviews
- the
domain_userid
field has been renamed touser_identifier
, as you now have the ability to define your own custom logic / alternative user id field for the aggregation - the
domain_sessionid
has been renamed tosession_identifier
, as you now have the ability to define your own custom logic / alternative session id field for the aggregation - apart from these, fields generated by the the out-of-the-box entities and enrichments have been added a prefix to indicate where they are coming from e.g.
device_version
has been renamed toyauaa__device_version
- these out-of-the-box entities and enrichments can optionally be enabled or disabled for the model (previously these values were there as NULLs) through variables e.g.
snowplow__enable_yauaa: true
which are then flattened automatically within thebase_events_this_run
table and can be referred to from there directly for custom models, if needed - some package variables have been renamed (as well as new ones added)
Things to bear in mind (no 100% accuracy!)
Due to the above mentioned breaking changes, ideally it is best to start from scratch and run the new package on your whole historic events dataset. However, in certain scenarios (e.g. in case of large data volumes) to save time, effort and cost it may be useful to try and upgrade instead using the existing modeled derived dataset.
Please bear in mind that the there is no 100% accurate upgrade solution but with the below steps and sql scripts we may be able to get you to roughly 95% of the way there:
Incomplete / inaccurate fields for historic data: The unified package contains a list of new fields, some of which we can compute on a row-by-row basis, in which case we provided update scripts to compute them for your existing derived data. There are, however, a few other new fields that are a result of aggregate or other computation that we cannot calculate without a full refresh. E.g.
absolute_time_in_s
for users therefore they will stay NULL. Then when a new event comes to update that record it may either correct it or it will just be inaccurate.Fields that are not available in the out-of-the-box derived tables: There are a few fields that we have dropped as they were incorporated into a new field based on different grouping. There may also be custom fields you have added to the derived tables. The below passthrough variables come in handy in this case:
snowplow__view_passthroughs: []
snowplow__session_passthroughs: []
snowplow__user_first_passthroughs: []
snowplow__user_last_passthroughs: []
For more on this, have a look at the passthrough fields section.
Upgrade steps
Step 1: Migrate to the latest version of the web package
Once you decided you would like to go ahead with the upgrade process you will first need to make sure your version of the web model is on par with the latest ( v.1.0.1). The Web model migration guides will help you decide what other changes you need to take into consideration and if the changes apply to your warehouse, you may need to execute the upgrade sql scripts to bridge the gap.
Step 2: Execute SQL scripts to create the new tables
You will then need to execute a list of sql scripts we provide below, which will first create the new tables based on your existing derived tables created by the web package, then make the changes (e.g. renaming, adding, dropping and updating columns wherever possible). The manifest tables will also be altered making sure that you will then have everything ready for a new run in the unified package as if nothing happened.
Execute the below sql scripts with your database IDE to create your new derived and manifest tables at once without having to reprocess your event data from scratch. Make sure to update (your_schema)_derived
to match your derived schema name beforehand. Please also be aware you might need to adjust the data varchar
data type to string
depending on your warehouse (Bigquery, Databricks users mainly) or limit it to the maximum in case there are limitations (potentially Redshift).
SQL scripts
Step 3: Setting up a new dbt project for unified
Due to a package conflict resulting from the last web and latest unified requiring a different version of the snowplow-utils package it is probably best to create a new dbt project for running the package.
3.1 Packages.yml
packages:
- package: snowplow/snowplow_unified
version: 0.4.1
3.2 Project.yml
In the project.yml file make sure to disable the mobile events:
vars:
snowplow_unified:
snowplow__enable_mobile: false
In case you changed any of the default variables in your web project, add them here as well. Bear in mind that while 90% of the variables stayed the same, there are some notable changes, deletions, additions and default modifications, please have a look below or at the detailed configuration page and adjust accordingly:
Removed variables:
snowplow__limit_page_views_to_session
Renamed variables:
snowplow__page_view_stitching
tosnowplow__view_stitching
snowplow__consent_cmp_visible
tosnowplow__cmp_visible_events
(Redshift only)snowplow__consent_preferences
tosnowplow__consent_preferences_events
(Redshift only)snowplow__cwv_context
tosnowplow__cwv_events
(Redshift only)
Added variables:
snowplow__conversion_stitching
snowplow__conversion_passthroughs
snowplow__enable_browser_context
snowplow__enable_app_errors
snowplow__enable_conversions
snowplow__use_refr_if_mkt_null
snowplow__browser_context
: com_snowplowanalytics_snowplow_browser_context_1 (Redshift only)snowplow__session_context
: com_snowplowanalytics_snowplow_client_session_1 (Redshift only)snowplow__mobile_context
: com_snowplowanalytics_snowplow_mobile_context_1 (Redshift only)snowplow__geolocation_context
: com_snowplowanalytics_snowplow_geolocation_context_1 (Redshift only)snowplow__application_context
: com_snowplowanalytics_mobile_application_1 (Redshift only)snowplow__screen_context
: com_snowplowanalytics_mobile_screen_1 (Redshift only)snowplow__application_error_events
: com_snowplowanalytics_snowplow_application_error_1 (Redshift only)snowplow__screen_view_events
: com_snowplowanalytics_mobile_screen_view_1 (Redshift only)snowplow__deep_link_context
: com_snowplowanalytics_mobile_deep_link_1 (Redshift only)snowplow__screen_summary_context
: com_snowplowanalytics_mobile_screen_summary_1 (Redshift only)
Variables with changed defaults:
snowplow__session_identifiers
:[{"schema" : "atomic", "field" : "domain_sessionid"}]
snowplow__user_identifiers
:[{"schema": "atomic", "field" : "domain_userid"}]
to nonesnowplow__databricks_catalog
: 'hive_metastore'
Step 4: Verifying the new derived datasets
At this stage you could potentially run both packages simultaneously until you make sure you are happy to stop the old jobs that updated the web package.
We have also provided views which replicate the original derived tables to help you maintain your existing reporting (in case you use the original derived tables as a source for your BI tool or for downstream data models).