Snowflake's lateral flatten function on variant data type
Snowflake is a really interesting new data warehouse built on top of AWS. I like their architecture because they had the interesting idea to separate data storage (backed by small files on S3) and compute to run queries (EC2 instances running their API).
I inherited a project where we would store complex JSON in a string in a field as varchar(64000). Then we would use regex patterns to get the values we wanted from them. Sometimes these regexes would get really involved, yuck.
The Variant data type
Snowflake offers mechanism to store semi structured data in field that is easy to parse - the Variant data type.
In snowflake I have a table variant_demo with a single Variant field named json_data as shown below:
If we were to click on the field, we could see the JSON elements as shown below:
If I wanted to get the value of the "membership" field, I could write the following SQL:
1
2
SELECT json_data:membership::string as membership
FROM variant_demo;
Now what if I wanted to pull out the list associated with "products" and join them to our products table?
Lateral Flatten Function
Snowflake has this really cool function that allow us to normalize a row from a list from JSON attribute from variant field. The SQL below uses lateral flatten to take the items in the list from json_data:products make them their own dataset:
1
2
3
4
5
6
7
8
9
10
WITH p as (
SELECT
json_data
FROM
variant_demo
)
SELECT
b.value::string as product_style
FROM p,
lateral flatten(input=> p.json_data:products) b
Once you have a nice clean record set, we can join on data with other tables:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
WITH p as (
SELECT
json_data
FROM
variant_demo
)
SELECT
b.value::string as product_style
, designer
, list_price
FROM p,
lateral flatten(input=> p.json_data:products) b
INNER JOIN products prod
ON v.value::string=prod.style