Many new Sigma customers initially expect to use more warehouse compute due to the direct, easy access to data they can now provide to users.

To offset Snowflake compute costs, Sigma applies multiple tiers of caching and evaluation that effectively reduce warehouse load, while delivering a faster user experience.

As a result, customers typically see their cost-per-user decrease, as they increase their number of users in Sigma.

For example, here is a demonstration of the savings, based on real customer usage:

This QuickStart introduces and discusses how Sigma is designed to optimize query performance as data is requested by the user interface. Extensive engineering time has been spent to determine a balanced solution that provides the best user experience, performance, and the least cost impact when used with a data warehouse.

To accomplish this, Sigma offers multiple tiers of caching and evaluation, designed to reduce data warehouse load and enhance the user experience.

Using these mechanisms, Sigma helps ensure that customers typically gain far more value from fast, easy access to data than the corresponding increase in cost. Achieving this balance is complex, and certainly not something that is easily done.

Over time, we have observed that for a given organization, economies of scale are realized. This means that as more Sigma users are on-boarded, the associated cost per user decreases.

For more information on Sigma's product release strategy, see Sigma product releases.

Target Audience

Tech executives, architects, developers and Sigma administrators looking for a deeper understanding of Sigma's unique approach. This approach enables Sigma to perform well against very large datasets without negatively impacting the end-user experience

Prerequisites

Sigma Free Trial

Footer

Each time a Sigma user performs an operation, Sigma evaluates where the data is retrieved from in order to populate a Workbook's tables, charts, pivots, and other elements.

To perform these calculations as quickly as possible, data operations (calculations) are attempted at various tiers.

Decisions are made based on where the requested data can be provided to the user most rapidly and at the lowest cost.

The image below illustrates where the query logic occurs, displayed in three "lanes". The items are numbered to correspond with detailed explanations in the subsequent list.

End User Lane

1: End User:
A user with a current browser. The choice of browser does not matter.

2: Sigma Browser Cache:
Sigma maintains a cache of recent results in the web browser. As result sets are returned from Snowflake, they enter the browser cache. Every new query is checked against recent results in the browser cache before anything is sent to Snowflake. When a matching query result is found, no network request or query is issued to Snowflake.

3: Sigma Alpha Query:
Aside from caching, Sigma has created a tool called Alpha Query that operates as a processing layer to calculate arithmetic operations instead of issuing a query to Snowflake.

Alpha Query leverages the browser's cache to compute new data. It can compute anything using data in cache, but if more data from the warehouse is needed, the request will need to be made to the warehouse.

Alpha Query supports the majority of the functions provided by Sigma today (even lookups!). This unique solution provides Sigma customers the best possible performance when working with data in a browser.

This substantially decreases the total number of queries issued to Snowflake during data prototyping and exploration.

Sigma Cloud Lane

4: Sigma Results Cache:
Sigma maintains a mapping of Snowflake query ID's and their Sigma query ID. If a Sigma generated SQL query has been previously run, Sigma can request the result from Snowflake using the query ID instead of reissuing a new query. This allows us to leverage the caching mechanisms of your CDW without storing data in our own servers.

5: Sigma Materialization:
Any data asset built in Sigma can be materialized as a table within Snowflake. By leveraging materializations on "Manual Triggers" or "Automations" in the Sigma UI, you can establish reusable tables that are less costly and more performant than re-running the queries.

Data Warehouse (Snowflake) Lane

6: Cloud Services Tier:
Query results in Snowflake are saved for 24 hours and are used only if the underlying data has not changed, non-deterministic functions are not used, etc.

7: Compute Tier:
Also referred to as "virtual warehouses", this term describes a set of resources allocated to perform data processing tasks such as loading, querying and running computations on data. These resources can be scaled on demand.

8: Warehouse Tier:
In this tier, the fully managed caching feature retains the result set of every query executed for 24 hours. Subsequent queries that match previously executed ones are retrieved from the cache rather than reprocessing the entire query. This significantly improves query response times by utilizing previously computed results.

9: Storage Tier:
The cloud-agnostic, independent, elastic, and scalable component that holds all the data loaded into Snowflake.

Footer

Now that we have the definitions, let's examine what happens when an end-user makes a request for data in Sigma.

The following decision tree is very high-level, and does not include lower-level details such as security, system architecture, and so on. In this QuickStart, we are solely focused on what happens when a Sigma user requests data or performs operations on already retrieved data. It is understood that this is not the only factor related to overall platform performance.

Suppose there is a Sigma workbook that contains one table. The user may (or may not) have added a few calculated columns, and they have published the workbook.

With that example in mind, please take a few minutes to review the decision tree, and consider what is likely to happen at each junction.

Footer

It is not immediately obvious that Sigma's Alpha Query is different than the caching that some warehouses do automatically.

Sigma will re-use existing results (by default), regardless of whether the underlying data has changed.

There is a control in the Sigma UI to bypass the result cache when the latest results are needed instead. There is also a control to determine the acceptable staleness for query results. of whether the underlying data has changed.

In this section, we will discuss operations that are part of this portion of our decision tree:

These calculations are often performed incrementally. In many cases, this means that Sigma only needs to perform calculations for a specific change – such as sorting a table, adding a column, or filtering a chart – rather than fully recalculating one or more elements in the Workbook.

There are a few preconditions necessary for in-browser calculations to kick in:

When Sigma cannot perform a calculation in the browser, it falls back to the next portion of the decision tree.

Incremental Computation with Large Data Sources

Sigma's general approach to in-browser calculations is known as incremental computation. Every time you interact with a Workbook – whether it's adding a column, updating a formula, changing a control, etc. – Sigma determines the minimal set of calculations necessary to produce or update the missing or changed data.

When Sigma can perform all of these calculations in the browser and return correct results, it can skip using the warehouse entirely.

Imagine an analysis involving billions of customer transactions, summarizing a few metrics across 250 retail store locations. The initial summary calculations must be performed in the data warehouse, since it would be impossible to fit billions of fine-grained customer transactions into a browser. However, Sigma can utilize the browser to perform some calculations on the summary metrics, specifically at the granularity of the 250 retail store locations.

For example, Sigma can determine the stores that generated over $10 million in revenue without examining billions of transactions. The only necessary data are the summary metrics for the 250 retail store locations – hence, this calculation can occur within the browser.

Incremental Computation - Example

Let's explore how this works using the Sigma UI. This will allow you to see how you might evaluate the operations being performed.

1: Log in to Sigma as an administrator.

2: Create a new Workbook in Sigma:

3: Add a new Table:

4: From NEW > TABLES AND DATASETS:

5: Select the Sigma Sample Database > RETAIL > PLUGS ELECTRONICS > F_POINT_OF_SALE table:

5: Select the Sigma Sample Database > RETAIL > PLUGS ELECTRONICS > F_POINT_OF_SALE table:

6: You will now see a new, unsaved Exploration. Before proceeding, observe how Sigma generated queries to the warehouse.

7: Open the Query History to evaluate what query operations Sigma is creating.

8: Note the three rows. Click on the slider to also show in-browser calculations. There are none so far since this is the first time the data from the warehouse has been requested for this workbook.

9: Make a few changes to the table and check the Query History to see the effects. For example, you can: