This QuickStart is designed as primer to materialization and the questions and issues that surround it.
We will define it, provide some guidance on the why/when to use it and who sets it up.
Lastly, we will step through using Sigma and Snowflake to materialize some data so that the workflow in Sigma is understood.
Sigma administrators who are interested in improving performance when working with complex datasets or just generally interest in learning more about materialization.
This QuickStart discussed the features and benefits of using materialization in Sigma and also how to configure and schedule materializations to improve the speed and performance of your complex reports.
The fundamental concept of storing pre-calculated data for performance optimization purposes (ie: caching) has been around for several decades, with advancements and optimizations occurring over time.
Materialization is a type of caching, where query results are written into a table in a data warehouse, and then refreshed at regular intervals (often daily).
This pre-calculated or pre-aggregated data can be accessed more efficiently and quickly than recomputing the results every time the query is executed.
In general, the main advantages of materialization are:
Materialization is a common strategy for improving performance of interactive queries, and it is most effective in certain situations.
You should consider materializing data in situations where it can provide tangible benefits in terms of query performance, data analysis, and resource management.
At Sigma, we see customers immediately benefiting from materialization in these use cases:
There is a broad range of potential use cases that benefit from materializing. Here are the most common, should you want to know more:
Help on Today() function
In the software market, there are several options available for caching/materialization, depending on your specific needs and the technology stack you are working with.
Here are some common options for materialization:
The availability and specific features of materialization options may vary depending on the software you choose. It's important to evaluate the capabilities, scalability, ease of use, and integration possibilities of these solutions based on your requirements, existing infrastructure, and the technology stack you are using in your organization.
Given all the available choices, this can feel like a complex decision but there are some points that will help make it easier to decide.
Some clients prefer a well governed, centralized materialization strategy, using a single, dedicated tool of their choice and that is fine too.
For some situations, both solutions can work together to save time and get projects completed faster. some clients choose to use Sigma-based materialization to support prototyping and rapid development cycles, because they almost no development time and can be done by a larger team.
For example, you can allow materialization using Sigma, during early development phases, then, later on, move some of the most used materialization logic into the warehouse. That will allow you to have a rapid development cycle, but still having a well governed environment for production usage.
There are some clear benefits to using Sigma to materialize.
The benefits of materializing from Sigma:
If you have chosen to use Sigma to materialize, it is important to control who in Sigma is able to materialize. By default, only Sigma Administrators are able to materialize. You can also create a custom account type "Creators who can Materialize" and grant them the materialization permission.
The ability to materialize in Sigma should be governed in some way - you typically don't want a large number of creators to materialize hundreds or thousands of potentially duplicate objects. It is common to grant the ability to materialize to a certain, smaller number of creators, using Sigma's RBAC system.
Lets start to go deeper into using materialization with Sigma.
Materializations allows you to write datasets and workbook elements back to your warehouse as tables which can reduce compute costs. Materialization enhances query performance by allowing your data warehouse to avoid recomputing the dataset when it's used by an element or a in descendant Sigma analysis.
Materializations are stored in your warehouse and saved in scratch workspace schema automatically managed by Sigma.
Sigma's query compiler automatically and transparently uses the latest materialization.
All data displayed in Sigma is queried directly from the warehouse; Snowflake in this case.
The complexity of the base query will carry through in all subsequent queries associated with sorting, filtering, and paginating the data. To solve this, Sigma datasets can be materialized back to Snowflake.
This is different from the dataset view object in Snowflake.
In Sigma's implementation, a
CREATE TABLE AS statement is used to store the result set of the SQL query generated by the dataset.
It is possible to materialize a Sigma table that contains multiple grouping levels by select a grouping level to materialize.
Materializations can be set on a schedule or triggered via the Sigma API.
By materializing a dataset an extremely complex query can be flattened down into a simple
SELECT meaning that all downstream queries are able to run under far less strain.
First, we need to create a database and schema in Snowflake that we can use when materializing data with Sigma.
Login to your Snowflake trial account as
Create a new
SQL Worksheet and copy/paste the following code into the worksheet:
-- SELECT WAREHOUSE use WAREHOUSE COMPUTE_WH; -- CREATE DATABASE AND GRANT USAGE CREATE DATABASE if not exists SIGMA_WRITEDB; use DATABASE SIGMA_WRITEDB; grant usage on database SIGMA_WRITEDB to role ACCOUNTADMIN; -- SCHEMA AND GRANT USAGE CREATE SCHEMA QUICKSTART; grant usage, create table, create view, create stage on schema QUICKSTART to role ACCOUNTADMIN
Click the drop arrow in the upper-right corner and select
It should return as
Statement executed successfully.
Login to Sigma and navigate to
scroll down and add information related to
When done, click
Create. The connection will be tested and if it passes, a success message will appear.
For more information on enabling write access in Sigma, click here.
Previously we discussed materialization in general and now we will discuss how it is implemented in Sigma, at a high-level.
A materialization is created through the act of scheduling a materialization. The materialization schedule will impact the data freshness. Long running queries that do not display in Sigma can still be materialized.
Materializations can be paused (manually or after a user-specified period of non-use) and deleted anytime.
Materialization isn't available for datasets that use parameters or system functions, although there are a few exceptions to this. For example, the system function Today() does work. These datasets are expected to return different values when their parameters change. Materialized tables, on the other hand, always return the same value - the fixed output of the dataset at the time the materialization was run. As a result, using the materialized versions of datasets that use parameters or system functions would produce unexpected results.
Materialization is incompatible with row-level security. The materialization will error if row-level security (user attributes) functions are referenced.
Datasets referencing other datasets, by means of duplication or joins, can typically be materialized. However, this isn't true if any underlying dataset(s) cannot be materialized.
For this exercise, we will leverage the sample database provided by Snowflake called
TPCH_SF10 which has an
ORDERS table consisting of 9 columns and 15M rows.
It is typical to use Sigma to join tables, add groupings and calculated columns when materializing. As previously discussed, flat tables will not benefit from materialization.
We will assume that our data has some joins and calculated columns, but skip that work in this section, as it is assumed to be understood how to do that in Sigma. We want to focus on how to materialize in Sigma first.
We will do a more complex, grouped table example later.
In Sigma, click the
Connections and select
Snowflake Trial from the connection list:
Expand the database called
SNOWFLAKE_SAMPLE_DATA and select the
ORDERS table. Then click
Orders table is now open in an unsaved Workbook. All columns are selected and there are 15M rows available:
Save this workbook as
hamburger icon (the 3-dots) and select
On the Schedules screen, we can create one or more times when this materialization will run.
Since we only have one element in our Workbook (Page 1 - Orders) and no groupings, that is what is selected.
Configure the schedule to be once a day, at 3am and your timezone:
The materialization will run now and once done, will show a
Status of "Success". The UI will also show the next scheduled runtime.
Close the open schedule window.
Navigate back to
Materializations where we can see the list of our current jobs:
Here we are able to see the status of the last run and how long it took; in this case, 15M rows in 11 secs using a Snowflake X-Small warehouse.
Click on the
Orders (under number 3) to go to the
Clicking on the
View materialization icon will show the latest run information and links to
Materialize now and
View Schedule pages.
We are able to see that Sigma has created a new table in Snowflake, based on our 9 column, 15M row configuration but the column data is not in a readable format. This "scratch schema" is managed by Sigma and is transparent to the Sigma user.
Materialization enhances query performance by allowing your data warehouse to avoid recomputing the data when it's used by an element or in descendant Sigma analysis.
To access this data from other applications, see Sigma's Dataset Warehouse Views feature.
Workbook materialization is similar to datasets, but have some major advantages, depending on the use case needs.
The easiest way to demonstrate this is to use a Sigma Template as our "Workbook".
Templates and select the
Plugs Electronics Sales Performance template.
Dismiss the pop-up asking if we want to use our own data.
Save As in the upper right corner and give the Workbook any name.
If we click to drop the menu on the
Quarterly Sales gauge we see that the underlying data that is driving the gauge can be materialized.
Visualizations tend to have grouped data underlying them so this is our first example of materializing grouped data. In this case, Sigma will just handle the complexity for you.
It is possible to design the dashboard first, getting it "just right" and then decide what portions to materialize based on a schedule later.
Aggregate navigation is an advanced BI design, whereby we create several materialized tables of different grain behind the scenes, then seamlessly swap the data sources between these tables, as the user changes their grain of analysis. The benefit of aggregate navigation is fast performance at the high grain of analysis, combined with the depth of the analysis at the low grain, all transparent and invisible to the end-user.
Aggregate navigation is fairly costly to create, using conventional BI tools, because you often need to manually create and populate several aggregated tables, then add an advanced, hard to write calculation to auto-navigate between them.
By contrast, aggregate navigation in Sigma is easy.
Aggregation functions or operators are applied to groups of data to calculate a consolidated value. Some common aggregation functions include sum, count, average, minimum, maximum, and median. These functions can be applied to numerical data, such as sales figures or temperature readings, as well as categorical data, such as counting the number of occurrences of different categories.
Aggregations are often used to generate reports, analyze trends, or derive meaningful statistics from datasets. For example, in a sales database, you could use aggregation to calculate the total sales for each product category, find the average revenue per customer, or determine the highest-selling region.
Aggregation can be performed on different levels, such as the entire dataset, specific groups or categories, or even across multiple dimensions. The choice of aggregation depends on the specific analysis goals and the structure of the data.
Tables that leverage aggregate navigation are solid candidates for materialization.
In this section, we will demonstrate building a table that consists of four joined tables and has 60M rows. You can build or just follow along; working with larger trial data may be a little slower due to the x-small Snowflake warehouse size.
Some general rules:
If we have five grouping levels in a table element, and we materialize at the third level, it creates three separate materializations - one for the top level, another for the second level and one for the third. The fourth, fifth and the base grouping levels will not be materialized.
This is important because there is a trade off here; performance vs. storage cost for grouping levels that are in-frequently used.
If levels four and five have hundreds of millions of records that can't be materialized, Sigma allows materialization of levels one thru three, with great performance and depth of detail when needed, albeit slower.
We will demonstrate by building on the Workbook we already have, but extend the data.
Users who are viewing Sigma content always want to see sales data rolled up from the
Region/Nation/Customer level with summary sales totals and order counts broken out for each group. They rarely explore the line item detail, but want the option to do so from time to time.
First, let's pause the existing materialization schedule so it no longer will run.
Click and open the
View schedule option:
Open the "hamburger menu" and select
Pause and then click
The ORDERS schedule Status will switch to "Paused". Close this dialogue.
Now lets make a duplicate of the
Orders table and move it to a new page:
It is also possible to copy/paste the table instead
Rename the new table
ORDERS by Customer and move it to a new page:
We now need to join the
CUSTOMER table to
Click as shown to select
Navigate the UI to locate the
Snowflake Trial connection and expand the tree to select the
TPCH_SF10 / CUSTOMER table. Click it and then click
We want to set the
Join Keys on
O Custkey = C Custkey:
The result will be some customers with no orders and some with multiple.
Preview Output and
Sigma presents the "lineage", which is a visual representation of how the data is mapped.
This is an opportunity to remove any unnecessary columns. We will take them all for this exercise, but best practice is to remove what is not needed to gain on performance and compute/storage costs. It also makes the user interface a bit cleaner for the users.
Repeat the join exercise to add the
Line Item table:
Looking at the lineage, we will materialize at the
Region level and consider the
LineItem level as our "base" or lessor used data that we do not want to materialize.
We now have about 60M rows of order detail by customer.
To look at the row counts of each table, we created this graphic. It is typical to have the row counts increase in size such that the "detail" or "base" level is the lowest level and not grouped or materialize due to size and frequency of use:
Create the following three groups in Sigma and add calculations for
Total Sales and
Order Counts as shown (the formulas are provided below for convenience):
CALCULATION: FORMULA: (note: they are all the same, using the groupings) Region Sales Sum([O Totalprice]) Region Order Count CountDistinct([O Orderkey]) Nation Sales Sum([O Totalprice]) Nation Order Count CountDistinct([O Orderkey]) Customer Sales Sum([O Totalprice]) Customer Order Count CountDistinct([O Orderkey])
Now that we have the table we want, we can materialize again, but this time we have the additional ability to select a grouping level for the table.
Schedule materialization UI again:
Create a schedule (once per day) based on the
Customer grouping level:
Once the materialization is completed, we can see that three new tables have been created in Snowflake. These tables are managed by Sigma and in normal operations, there is no need to look at them, but we want to show the results and make a few important points.
Upon inspection of our
Sigma WriteDB we see that three tables have been created (notice that the last character in it's name is "_0., _1 and _3". For the last table (our highest group level) we see three columns that correspond to our first group in Sigma.
The column names are also not "friendly names". This is by design:
It is possible to access this data from other applications using Sigma's Dataset Warehouse Views feature.
Once the materialized tables have been accessed, the exist in the warehouse cache (based on Snowflake caching rules) which also helps performance. We can see that Sigma has automatically read from warehouse cache (as shown below), by looking at the "behind the scenes" Sigma Query History.
Note how we are reading from the Sigma_WriteDB.QuickStart materialized tables, as the query builds each level of the query:
Now our most commonly used order data is performant:
For those interested, please refer to the Sigma on Snowflake Best Practices guide, located here:
This QuickStart was designed as primer to materialization and the questions and issues that surround it.
We defined it, provided guidance on the why/when to use it and who might set it up.
Lastly, we stepped through using Sigma and Snowflake to materialize data using a Sigma, including grouping data.
Additional Resource Links
Help Center Home