Thought Leadership

How to untangle the terms behind your data warehousing decisions

Nick Amabile


September 23, 2020
Featured image for “How to untangle the terms behind your data warehousing decisions”

Upgrading your data analytics capabilities opens the doors to what can be an unfamiliar world.

The rapid pace of technological change has introduced multiple layers to data management, and the terminology is confusing.

When it comes to choosing a data platform, you’ll likely get lost in a maze of jargon. What separates a data lake from a data mart? And does big data require a big investment to manage the details of your business?

While the terms surrounding data technologies are  confusing, choosing an architecture that’s right for your business needs doesn’t have to be. In contrast with the physical constraints of past data storage methods, today’s platforms allow for a separation between storage and computing costs. As a result, you can scale the amount of storage you need with how often it will be accessed and transformed for analysis.

But before you’re able to establish the best practices for your business, you have to understand the ways modern, cloud-based data warehouses work. With the right strategic approach, you can ensure your data is secure, streamlined, and available for the insights your stakeholders need.

A data warehouse extends the reach of your data lake

Two common buzzwords associated with large-scale data storage, data lake and data warehouse share a lineage in that both are repositories for your company’s information. But the difference lies in their capabilities.

True to its name, a data lake is an expansive pool of all the raw data of your enterprise. In the old world of physical storage, a data lake housed all of a company’s data while its most commonly used portions would be stored elsewhere for analytic purposes. By eliminating the need to store all their data on a more costly platform with analytics capabilities, companies could better manage their storage costs.

Now, data warehouses have uncoupled storage and analytics capabilities into separate charges. Capable of combining data from multiple sources, these platforms  charge nearly the same amount for storage as a data lake on a cloud platform like Amazon’s S3. However, the crucial difference is that through a data warehouse, all of your data remains accessible and open to queries. By contrast, S3 is just a storage layer. If you wanted to run a report on this data, you would need separate software provided by Amazon or another cloud provider. Plus, you have to manage all the security and encryption because a data lake is only a storage solution.

With an open platform like Snowflake, your organization doesn’t need a separate data transformation tool. It has the storage and computing capacity in a single source. Plus, modern warehouses are programmed in SQL, which means your analytics team won’t need to learn a new coding language to execute queries.

Of course, you can still use a data lake with a data warehouse if your organization has concerns about being locked into a single platform. But given Snowflake has the flexibility to move your data onto a different platform or integrate with other platforms for additional functions like machine learning, lock-in is not a major concern.

3 layers of a data warehouse provide flexibility and efficiency

While all of your data is accessible on a data warehouse platform, it is not always in the same place. To maximize the system’s effectiveness, data warehouses store and transform your information across three layers:

  • Raw data is exactly as the term describes. Stored in various formats, this information hasn’t been standardized or transformed into a usable condition. All of your data starts in this state.
  • Staging data functions as a working space. A data warehouse will store working copies of partially transformed data here. Or your analytic teams will run QA programs to ensure the system is working.
  • Business data is where analytics programs like Looker generate reports and insights. While raw data can be elevated to this state, only data that has been transformed to this layer is open for analysis. This creates a consistent source of truth for your reporting.

Allowing the full breadth of your data to be open for analysis is a crucial benefit to a data warehouse. Most organizations don’t necessarily recognize the value in their raw data. However, one day, a use case may arise where you’ll need access to raw, historical data. With a data warehouse, your analytics teams can access this information and pull it into the business layer for analysis.

A single data platform is also beneficial in the event of a bug or a needed change to your data. With a data lake, this is a multi-step process. You would have to access that storage system, find the bug, then pull it into a computing platform to resolve. With a data warehouse, development speed and flexibility increases because all your data is in one place.

The right data warehouse sets the foundation for your analytics program

As data technology has grown, so have the possibilities for fragmentation. Your data warehouse must be able to integrate with your cloud provider, data transformation technologies, and business intelligence platforms. The right data warehouse allows your organization to avoid the costs and complexity of migrating your data across multiple platforms.

You can access a variety of different data tools in Snowflake or a BI platform like Looker. Looker allows you to access your data model, which designs the tables and definitions within your business layer. Within that data model are individual data marts, which constitute a specific subject area like orders or customers.

With the right data warehouse in place, all of these components come together in a single, secure data stack. The terminology may not come easily, but when deployed strategically, the benefits to your data analytics efforts certainly do.