This QuickStart is designed as primer to materialization and the questions and issues that surround it.
We will define it, provide some guidance on the why/when to use it and who sets it up.
Lastly, we will step through using Sigma and Snowflake to materialize some data so that the workflow in Sigma is understood.
For more information on Sigma's product release strategy, see Sigma product releases
If something is not working as you expect, here's how to contact Sigma support
Sigma administrators who are interested in improving performance when working with complex datasets or just generally interest in learning more about materialization.
The concept of storing pre-calculated data for performance optimization (similar to caching) has been around for decades, with many advancements and refinements over time.
Materialization is a form of caching where query results are written into a table in a data warehouse and refreshed at regular intervals (often daily).
This pre-calculated or pre-aggregated data can be accessed more quickly and efficiently than recomputing results every time a query is executed.
This pre-calculated or pre-aggregated data can be accessed more efficiently and quickly than recomputing the results every time the query is executed.
Materialization is a common strategy for improving the performance of interactive queries, but it is most effective in specific situations.
You should consider materializing data when it can provide tangible benefits in query performance, data analysis, or resource management.
At Sigma, we see customers immediately benefit from materialization in use cases such as:
Complex and Resource-Intensive Queries
Queries involving multiple tables, joins, aggregations, or calculations benefit significantly from materialization. By precomputing and storing results, subsequent queries avoid expensive operations and return results faster.
Data Analysis and Reporting
Reporting often involves aggregations, summaries, and transformations. Materializing these results allows analysts to access data instantly, enabling faster insights and smoother reporting workflows.
Time-Dependent Data
For data that changes infrequently or on a predictable schedule, materialization avoids unnecessary recomputation. Materialized tables can be refreshed at regular intervals or triggered by events to balance freshness and efficiency.
Resource Management
Materialization can optimize compute usage by offloading expensive calculations to precomputed tables, freeing live system resources for other workloads.
Matching Grain to Need
If source data contains millions of transactions but reports only need daily, weekly, or monthly summaries, materialization can reduce the dataset to a smaller, aggregated table. This is often referred to as reducing the grain or reducing cardinality.
Frequently Accessed Data
Datasets or views accessed repeatedly by many users are good candidates for materialization. By serving queries from precomputed tables, the load on source systems is reduced.
In the software market, there are several options available for caching and materialization, depending on your specific needs and the technology stack you are working with.
Data Warehousing Solutions (and most RDBMS)
Data warehousing platforms such as Amazon Redshift, Google BigQuery, and Snowflake are designed to handle large-scale analytics workloads. They often provide optimized features for materialization, including materialized views, caching mechanisms, and built-in query optimization techniques to improve performance.
In-Memory Databases
In-memory databases like SAP HANA, Redis, and Apache Ignite store data in memory rather than on disk, which can significantly improve query speed. These platforms often include built-in mechanisms for materialization, such as columnar storage, data replication, and preloading frequently accessed data.
Caching Systems
Caching systems such as Memcached and Redis can be used to materialize frequently accessed data. By storing query results or precomputed data in memory, these systems allow for quick retrieval and reduce repetitive computations.
Custom Data Pipelines and ETL Processes
In some cases, custom materialization solutions may be required using data pipelines or ETL processes. Tools like dbt, Apache Airflow, Apache Spark, or custom scripts can be used to schedule and automate the materialization process, ensuring that materialized data remains up to date.
The availability and specific features of these options will vary depending on the software you choose. It's important to evaluate capabilities, scalability, ease of use, and integration with your existing infrastructure and technology stack before selecting an approach.
With so many available options, deciding where to materialize data can feel complex—but there are some guidelines that can simplify the choice.
Some clients prefer a well-governed, centralized materialization strategy, using a single dedicated tool of their choice. That approach works well in many environments.
In other situations, both approaches can complement each other. For example, some clients use Sigma-based materialization for prototyping and rapid development, since it requires little to no engineering effort and can be done directly by analysts or business teams. Later, once workflows mature, materialization logic can be migrated into the warehouse for long-term production use.
This hybrid approach provides the speed and agility of Sigma during development while ensuring a governed, centralized strategy in production. There are some clear benefits to using Sigma to materialize.
If you choose to use Sigma for materialization, it's important to control who has permission to materialize.
Sigma's RBAC system provides the flexibility to strike the right balance between agility and governance.
Materialization allows you to write datasets and workbook elements back to your warehouse as tables, reducing compute costs and improving query performance. By materializing, your data warehouse avoids recomputing the dataset every time it is used in a Sigma element or descendant analysis.
Materializations are stored directly in your warehouse, inside a scratch workspace schema automatically managed by Sigma.
Sigma's query compiler automatically and transparently uses the latest materialization.
All data displayed in Sigma is always queried directly from the warehouse (Snowflake in this case). However, the complexity of the base query carries through to all subsequent queries, including sorting, filtering, and pagination. To solve this, Sigma datasets can be materialized back to Snowflake.
This is different from a dataset view object in Snowflake.
In Sigma's implementation, a CREATE TABLE AS
statement is used to store the result set of the SQL query generated by the dataset. You can also materialize a Sigma table that contains multiple grouping levels by selecting a specific grouping level to materialize.
Materializations can be refreshed on a schedule or triggered programmatically via the Sigma API.
By materializing a dataset, even extremely complex queries can be flattened into a simple SELECT, meaning all downstream queries run with far less strain.
First, we need to create a database and schema in Snowflake to store our materialized data.
Log in to your Snowflake trial account as ACCOUNTADMIN
.
Open a new SQL Worksheet
.
Copy/paste the following code:
-- OPERATE AS ADMIN
USE ROLE ACCOUNTADMIN;
-- SELECT WAREHOUSE
USE WAREHOUSE COMPUTE_WH;
-- CREATE DATABASE AND GRANT USAGE
CREATE DATABASE if not exists QUICKSTARTS;
USE DATABASE QUICKSTARTS;
GRANT USAGE ON DATABASE QUICKSTARTS to role sigma_service_account_role;
-- SCHEMA AND GRANT USAGE
CREATE SCHEMA if not exists MATERIALIZATION;
GRANT USAGE, CREATE TABLE, CREATE VIEW, CREATE STAGE ON SCHEMA MATERIALIZATION TO ROLE sigma_service_account_role
Click the dropdown arrow in the upper-right corner and select Run All
.
You should see: Statement executed successfully
.
Log in to Sigma.
Navigate to Administration
> Connections
.
Click Create Connection
.
For more information, see the QuickStart Snowflake Key-pair Authorization
Scroll down to the Write Access
section and enter the required information:
To use dynamic tables in Snowflake, the Snowflake user (or role) connected via Sigma must have the following privileges:
Additionally, any underlying source tables used in a dynamic table must have change tracking enabled with non-zero time travel retention. Enabling or altering change tracking requires OWNERSHIP of those objects.
Click Create
. Sigma will test the connection and, if successful, display a confirmation message.
For more information, see Set up write access
Previously, we discussed materialization in general. Now, let's look at how it is implemented in Sigma at a high level.
A materialization is created by scheduling a materialization job. The schedule you choose directly impacts data freshness. Even long-running queries that cannot display in Sigma can still be materialized.
Materializations can be paused (either manually or automatically after a user-specified period of inactivity) and can be deleted at any time.
Parameters and system functions:
Materialization is not available for data that use parameters or most system functions, with a few exceptions. For example, the system function Today()
does work.
Data is expected to return different values when parameters change. Materialized tables, however, always return the same fixed output generated at the time the materialization ran. Using them with parameterized datasets could produce unexpected results.
Row-level security:
Materialization is incompatible with row-level security. If user-attribute functions are referenced, the materialization will error.
Data referencing other data:
Data that duplicates or joins other data can usually be materialized. However, if any underlying data cannot be materialized, the dependent data cannot be either.
For this exercise, we'll use the Snowflake sample database TPCH_SF10, which contains an ORDERS table with 9 columns and ~15M rows.
Typically, materialization in Sigma is applied after joining tables, adding groupings, and creating calculated columns. As discussed earlier, flat tables by themselves won't benefit much from materialization.
For simplicity, we'll assume that some joins and calculated columns already exist, but we'll skip that setup here so we can focus directly on how to materialize data in Sigma. A more complex, grouped example will follow later.
In Sigma, click the icon to return home.
Go to Connections
and select Materialization QuickStart
from the list:
Expand the database SNOWFLAKE_SAMPLE_DATA
and select the ORDERS
table. Click Explore
:
The Orders table opens in an unsaved workbook. All 9 columns are selected, with 15M rows
available:
Save this workbook as Materialization QuickStart
.
Click the hamburger menu (3 dots) and select Advanced options
> Schedule materialization:
In the Materialization schedules
modal, create a new schedule.
Since the workbook only contains one element (Page 1 – Orders), that element is pre-selected.
Set the schedule to run once daily at 3:00 AM
in your local timezone.
Click Save Schedules
.
The materialization will run immediately.
Once complete, the status will show Success
and the UI will display the next scheduled run time:
Close the schedule window.
Navigate back to Administration
> Materializations
where we can see the list of our current jobs:
Here you can check the last run status, duration, and row count. In this example, the job processed 15M rows in 16 seconds
using a Snowflake X-Small
warehouse.
Click on the Orders
text to return to the Materialization QuickStart
workbook.
Click the View materialization
icon to see:
Sigma creates a new table in Snowflake based on your materialization configuration (in this case, 15M rows).
The table is stored in a scratch schema that Sigma manages automatically. While the column data may not appear in a user-friendly format, this schema is fully transparent to Sigma users and is not intended for direct use.
Materialization improves query performance by allowing your data warehouse to reuse precomputed results instead of recomputing the dataset each time it is used in a Sigma element or descendant analysis.
If you need to access this data from other applications, see Review warehouse view details
Workbook materialization is similar to datasets, but has some major advantages depending on the use case.
The easiest way to demonstrate this is to use a Sigma Template
as our workbook.
Return to the Sigma homepage and click Templates
> External,
then search for Plugs
. Click the Plugs Electronics Sales Performance
template:
Click Save As
in the upper-right corner and give the workbook any name.
If we click to drop the menu on the Historical Profit and Margin
bar chart, we see that the underlying data driving the gauge can be materialized.
Charts tend to have grouped data underlying them, so this is our first example of materializing grouped data. In this case, Sigma just handles the complexity for you.
It is possible to design the dashboard first, get it just right, and then decide what portions to materialize on a schedule later.
Aggregate navigation is an advanced BI design approach where we create several materialized tables of different grain behind the scenes, then seamlessly swap the data sources between these tables as the user changes their grain of analysis. The benefit is fast performance at higher levels of aggregation, combined with the depth of analysis at lower levels — all transparent to the end-user.
Using conventional BI tools, aggregate navigation can be costly to implement, since you often need to manually create and populate several aggregated tables, then add complex logic to auto-navigate between them.
By contrast, aggregate navigation in Sigma is straightforward.
Aggregation functions or operators are applied to groups of data to calculate consolidated values. Some common functions include sum, count, average, minimum, maximum, and median. These can be applied to numerical data (e.g., sales figures, temperature readings) as well as categorical data (e.g., the number of occurrences by category).
Aggregations are widely used to generate reports, analyze trends, or derive meaningful statistics from datasets. For example, in a sales database, you might calculate total sales by product category, average revenue per customer, or the highest-selling region.
Aggregations can be applied at different levels — entire datasets, specific groups or categories, or across multiple dimensions. The choice depends on analysis goals and data structure.
Tables that leverage aggregate navigation are excellent candidates for materialization.
In this section, we'll demonstrate building a table that joins four other tables and contains ~60M rows. You can follow along, though performance may be slower in trial accounts using an X-Small Snowflake warehouse.
For example: If we have five grouping levels in a table element and materialize at the third level, Sigma creates three materializations: one for the top level, one for the second level, and one for the third. The fourth, fifth, and base grouping levels will not be materialized.
This trade-off matters: performance vs. storage cost. If levels four and five contain hundreds of millions of records, Sigma still allows materialization of levels one through three — delivering strong performance with the option to drill deeper when needed (though more slowly).
We'll build on the existing workbook but extend the data.
Users viewing Sigma content often want to see sales data rolled up from the Region/Nation/Customer
level, with summary sales totals and order counts broken out for each group. They rarely explore line item detail but want the option to do so occasionally.
First, pause the existing materialization schedule so it no longer runs.
You can do this either in the workbook or in Administration
> `Materialization:
The ORDERS
schedule status will switch to Suspended
.
Click Orders
to return to the workbook and then click Edit
.
Now, make a duplicate of the Orders
table:
Move the duplicate to a new page:
You can also copy/paste the table instead.
Rename the new table ORDERS_by_Customer
.
Next, we'll join the CUSTOMER
table (from the database) to ORDERS
.
Click Join
:
Navigate the UI to locate the Materialization QuickStart
connection, expand the tree, and select TPCH_SF10
> CUSTOMER
. Click Select
.
Set the join keys: O Custkey = C Custkey
.
The result will show some customers with no orders and some with multiple.
Click the +
icon to add another join:
Repeat the process to join the Nation table to the Customer
table (not Orders):
Join the Region
table to the Nation
table:
Join the Line Item
table to Orders
:
Click Preview Output
.
Sigma displays the lineage, which shows a visual representation of how the data is mapped.
Looking at the lineage, we'll materialize at the Region
level and treat the LineItem
level as our "base" or less frequently used data that we won't materialize.
Click Done
.
We now have about 60M rows
of order detail by customer:
Add new columns, naming and setting each formula as shown below:
NAME: FORMULA: (note: they are all the same, using the groupings)
Region_Sales Sum([O Totalprice])
Region_Order_Count CountDistinct([O Orderkey])
Nation_Sales Sum([O Totalprice])
Nation_Order_Count CountDistinct([O Orderkey])
Customer_Sales Sum([O Totalprice])
Customer_Order_Count CountDistinct([O Orderkey])
Create the following three groups in Sigma:
Be sure to click Publish
when done.
Now that the table is ready, we can materialize again — this time with the option to select a grouping level.
Open the Schedule materialization
UI:
Create a schedule (once per day) based on the Customer
grouping level:
Once complete, three new tables will appear in Snowflake.
Looking at the Sigma WriteDB, the three tables are organized with the lowest level first, followed by the next two grouping levels. Column names are not "friendly" — this is intentional:
It is possible to access this data from other applications using Sigma's Dataset Warehouse Views .
Once materialized tables are accessed, they also exist in the warehouse cache (subject to Snowflake caching rules), further improving performance.
For those interested, please refer to the Sigma on Snowflake Best Practices guide
This QuickStart was designed as a primer on materialization and the questions and considerations that surround it.
We defined what materialization is, provided guidance on when and why to use it, and discussed who typically sets it up.
Finally, we stepped through using Sigma and Snowflake to materialize data — including examples with grouped data.
Additional Resource Links
Be sure to check out all the latest developments at Sigma's First Friday Feature page!
Help Center Home
Sigma Community
Sigma Blog