In Agile Cost Management in Snowflake – Part 1, Compute, we discussed managing compute costs. Managing storage costs is much simpler, but it is still very important, as poorly managed storage will result in unexpected expenses.
As with Compute, Snowflake’s revolutionary architecture requires a different approach and mindset to managing storage. In legacy DW RDBMS, storage is a limited resource. The need to accurately estimate current and future requirements is a key driver in determining cost. Adding storage is expensive and often requires downtime. Reducing physical storage is unheard of. Instead, DBAs typically focus on reclaiming and reusing storage as data becomes obsolete. This often involves complex and burdensome Information Life Cycle processes. Finally, managing and ensuring the reliability of backups, restores and recovery is a major headache.
Snowflake brings unlimited storage to the cloud, enabling an Agile “lean and fit” storage management style, effectively eliminating the DBA burdens discussed above. Instead, the focus is on appropriate, simple settings to control storage costs.
Storage charges in Snowflake are simply “pass through” of the cloud vendor’s pricing.
Snowflake charges monthly for data in databases and data in Snowflake file “stages”. Charges are based on the average storage used per day, computed on a daily basis.
Managing Cost in Stages
Cost management in stages is pretty simple: remove files from stages after successful load into the target databases, unless there is a compelling reason to keep them. This can be performed as part of the COPY INTO command or via the REMOVE command after loading.
As we’ll see, appropriate use of Snowflake’s data retention capability, Time Travel, is the key to controlling storage costs. Time Travel features enable:
- Restoring accidentally or deliberately dropped databases, schemas and tables with simple SQL
- Cloning tables, schemas and databases from key historical points with simple SQL
- Backing up data for historical and archiving purposes via cloning from key points in the past with simple SQL
- Querying data as it existed at previous points in time with simple SQL
Snowflake’s Database Storage
To manage database storage costs, it’s important to first understand the relevant parts of the storage architecture.
Snowflake uses unlimited “blob” storage for database and file stage storage, which cloud vendors tout as having “eleven nines” (99.999999999) of durability. All storage is stored redundantly in multiple availability zones (AZ).
Snowflake’s “secret sauce” is the use of “micro-partitions”. A micro-partition is a file object in blob storage.
A micro-partition has the following characteristics:
- Micro-partitions are immutable. Once written, their data is never modified.
- Micro-partitions are always compressed and encrypted. The maximum (and common) size is 16MB, typically representing 100-150MB of uncompressed data.
- In addition to statistics on the data, a micro-partition may have a “date modified” timestamp.
- “Active” micro-partitions have not been modified, i.e. no “date modified” timestamp.
- “Historical” micro-partitions have been modified, i.e. a “data modified” timestamp is present
- UPDATE, DELETE, and a MERGE that performs an UPDATE, result in the date modified timestamp being set and a new micro-partition being written. INSERT (and COPY) always create new micro-partitions.
- Truncating a table effectively sets the data modified timestamp to the time of the operation for all of the micro-partitions in the table.
- Dropping a table sets the date modified timestamp to the time of the operation and makes the table definition unavailable.
When a date modified timestamp is present, the data retention setting determines how long the data is available.
When a date modified timestamp is present, the data retention setting determines the length of time the historical data will remain available. This in turn determines the charge for the data.
Data retention may be set at the account, database, schema or table level via DATA_RETENTION_TIME_IN_DAYS parameter. A value at a higher level, e.g. database, may be overridden by setting the value for an object at a lower level. Time Travel may only be set for permanent tables and Transient tables. The default is always 1 day. Transient tables may only be set to 1 or 0 days.
These features enable Snowflake’s “Time Travel” capabilities. In addition, there is a “Fail Safe” backup of 7 days, available in all editions. Recovering data from Fail Safe requires Snowflake support intervention. Time Travel and Fail Safe are collectively referred to as Continuous Data Protection (CDP).
Historical micro-partitions are moved into Fail Safe when their data retention setting is exceeded.
“Standard Time Travel” is only available for 1 day. This applies to both Standard and Premier Editions.
“Extended Time Travel” may range from 1 to 90 days retention for the Enterprise, Business Critical and Virtual Private Snowflake editions. Managing Extended Time Travel is the key to managing storage costs.
Note that Time Travel can be disabled by setting the retention period to 0. This is generally a bad idea.
Time Travel Cost Example
This is an extreme example to demonstrate Time Travel’s effect on storage costs and why data retention management is critical.
- Create a permanent (not TRANSIENT or TEMPORARY) table with 90 day data retention
- Populate on Day 1 with 10GB
- Drop table on Day 1.
- Total cost = (90+7) * 10GB * price/GB/day
This will enable you to UNDROP the table for 97 days from the time of dropping it. However, you will be charged for 97 days of storage, even though you might never UNDROP and/or use it! (We’ll show you how to fix this later.)
Similarly, historical micro-partitions modified due to UPDATE, DELETE and MERGE are moved into Fail Safe when their data retention setting is exceeded.
Changing Data Retention Period
The data retention period may be changed for an object via
the SQL command:
ALTER <object> DATA_RETENTION_TIME_IN_DAYS=
If the retention period is increased, all data that has not been moved into Fail Safe will be affected by the new value. E.g., if the value is increased from 10 days to 20 days, historical data that would have been moved to Fail Safe after 10 days lives for an additional 10 days.
If the retention period is decreased, data that has exceeded the original retention period will be moved to Fail Safe. E.g., decreasing the value from 10 to 1 will result in 9 days of historical data will be moved into Fail Safe.
Fixing the 90 day “Mistake”
Let’s assume that this table and its population was a mistake. Fortunately, we can reduce the cost to a minimum of Fail Safe’s 7 days, depending on when the mistake is discovered.
- UNDROP the table
- Modify the data retention period to 0 days
- Drop the table
The data was previously moved into Fail Safe, so you are still subject to a minimum charge of 7 days plus any time the mistake went undiscovered.
Managing Cost and Time Travel
The key decision to be made is “how long do you need data retention”? Key factors are:
- Is point in time query required for modified historical data? Remember, 90 days is the maximum age for historical data (for Enterprise Edition and above).
- Is point in time query required for overall data consistency, e.g. multiple ETL/ELT processes must be completed before querying a set of tables?
- How far back might you need to go for fast restore of dropped objects? Remember, data is redundantly stored in 3 availability zones with eleven 9s of durability, so backup for media failures is not needed.
- How volatile is your data, how frequently are UPDATEs, DELETEs and MERGEs that modify data performed?
- Is there a need to create historical clones, for testing, debugging, etc.? Note that micro-partitions referenced by clone(s) live until all retention time in referencing clone(s) is exceeded. See Cloning Tables, Schemas, and Databases.
Higher volatility = increased storage usage by creating new micro-partitions. Carefully consider the tradeoffs; does high volatility mean increased need for point in time queries and possibly recovery from mistakes? Or the opposite?
Staging tables should have minimal data retention. They should be either TEMPORARY or TRANSIENT tables. Temporary tables only exist during a session. Transient tables may have data retention of 1 or 0. Data is never moved into Fail Safe for either type of table.
Be careful when using Cluster Keys, as this may result in frequent aging of micro-partitions and unexpected storage costs. Consider a fairly low data retention setting when using cluster key.
Databases, schemas and tables may be cloned as of a point in time using the AT clause. This is particularly useful for regression testing, debugging and other development and QA.
- ACTIVE_BYTES –number of bytes in active micro-partitions
- TIME_TRAVEL_BYLES – number of bytes in historical micro-partitions
- FAILSAFE_BYTES – number of bytes in Fail Safe
- RETAINED_FOR_CLONE_BYTES – number of bytes in micro-partitions referenced by clones
- TABLE_DROPPED – date and time if table was dropped
- TABLE_ENTERED_FAILSAFE – date and time the table entered Fail Safe. Requires Snowflake support to restore
The retention period in days for a table is found is found in the RETENTION_TIME column in the TABLES view. Note that the BYTEs column in this view does not reflect true storage cost; it represents uncompressed byte count for a scan of active micro-partitions.
Snowflake provides unparalleled capabilities to control your cloud data warehouse spend in an agile manner. It further provides unrivalled Continuous Data Protection against failures of all types. Investing in a new approach to storage management offers an opportunity to significantly improve your ROI on Snowflake.
Thanks to Amy Kaulius for her technical review, and LBJ,my editor/wife.
Copyright © 2019, Jeffrey M. Jacobs, permission to redistribute with copyright notice