Assessing the True Value and Fit of Real-Time Analytics in Business Data Strategies
The phrase “real-time” data gets thrown around quite a lot. It’s a nice buzzword, sounds catchy, and repeating it in meetings with non-data pros makes you sound knowledgeable. However, few know what “real-time” analytics truly encompasses.
Real-time data or data streaming is used to process data continuously as it becomes available, with the overarching goal of reducing latency, sometimes as little as milliseconds after the data is available.
The most common strategies that companies use other than streaming are:
- Batch processing: Files are processed on a regular cadence, usually falling somewhere between hourly and daily.
- Micro-batching: A hybrid approach. Files are copied regularly from a cloud provider or stage on a short cadence; however, there is still a mild lag on both the ingestion and the subsequent transformation of the data.
If you were to casually ask business professionals using a data platform whether they want their data to be real-time, the answer would most likely be a resounding “yes.” However, if you were to dig deeper into this question of WHY the data needs to be real-time, most would probably have a more difficult time answering.
Data used in reporting is rarely actually “real-time.” If data needs to undergo any sort of transformation, it will result in at least a several-minute lag; more frequently, it can take multiple hours or even up to a day to process.
Take the following hypothetical example.
ACME corporation is implementing new data pipelines in Snowflake for various reporting needs. After some internal debate, they decided that real-time data streaming sounds like a great idea. Their data engineering team sets up a Kafka streaming solution to run in as near real-time as possible. They provision Snowflake warehouses to process this data continuously with no downtime, and support constantly monitors this pipeline and resolves errors. This data is then modeled in their business intelligence (BI) tool.
After months of testing this new implementation, they roll it out to users. Analysts from the Finance and marketing departments are added as users. The initial business stakeholders and analysts from the finance and marketing departments take the elegantly designed, real-time data pipeline, monitored with an on-call support schedule, and click “download into Excel,” where they do their actual data analysis.
While this example is hypothetical, it reminds me of a popular phrase in software engineering:
Premature optimization is the root of all evil.
In the circumstances above, choosing a data architecture that correctly matches the needs of the business could have saved time, complexity, and expensive computing resources.
Let’s walk through a few scenarios where real-time streaming would be the recommended architecture:
Getting data earlier impacts business decisions
Minutes matter. Getting your data a few minutes earlier influences the critical business decisions that are made. Examples can include:
- Outage detections for a network
- Real-time monitoring of user data for security threats or DDOS attacks
- IoT devices that generate constant streams of data that continuously need to be monitored
The source of your data is well suited for it
When streaming data, think:
- High Volume, Low Latency:
- Streaming is well-suited for high-volume data with low-latency requirements. Streaming is the preferable option if the data is constantly flowing and you need to process and analyze it as it arrives.
- Event-Driven Architectures:
- Event-driven architectures, where specific events trigger actions, often benefit from streaming solutions.
- Efficient external data providers:
- Many times, providers of third-party data or software vendors can’t always provide:
- A fault-tolerant API with high throughput limits (i.e. constant 429 rate limit errors, lack of ability to filter by timestamps, etc.)
- The ability to share data directly from a cloud data provider
- Many times, providers of third-party data or software vendors can’t always provide:
In this case, streaming may be the most effective and reliable way to retrieve data.
Someone is listening on the other side
Let’s say you have decided that streaming data fits an imperative business use case for which scheduled ingestion or micro-batching is insufficient.
You have also mapped out that a streaming solution best fits the source of the data, size, and compute resources needed.
Great! But once you deploy your solution, how do you know anyone is listening on the other side?
Going back to our example earlier, what good is streaming data if the primary use case is downloading it into Excel? Real-time data needs analysis that is well…real-time. To get the most out of streaming data, we need:
- Rules in place to identify anomalies (DDOS attacks, Outages, security threats) and alerting to signal these events
- Trained users who know what to do with this information
Another way to think about it is to ask, is a computer interpreting the results, or a human being? If a computer uses this data in real-time to follow predefined rules, like checking if a transaction is fraudulent or generating an incident from a network outage, then getting the data as quickly as possible is a huge advantage. Suppose a human being interprets the results, like running a massive ML model on historical data or forecasting sales data for the following year. In that case, streaming doesn’t provide those same advantages.
Partnering with DAS42 for Successful Implementation
Streaming and batch processing are both valid solutions, and many companies utilize a mix of both depending on the source of the data and the use case.
DAS42 brings a wealth of experience and expertise to the table when it comes to crafting and implementing data strategies. With a long track record of successful implementations, DAS42 can help guide a data strategy that fits the needs of your business and the users of your data.
DAS42 is a premier data and analytics consultancy with a modern point of view. We specialize in solving some of the most complex business challenges for the world’s most successful companies. As a Snowflake Elite Partner, DAS42 crafts customized strategies that create a single source of truth and enable enhanced and faster decision-making. DAS42 has a presence across the U.S. with primary offices in New York City and Denver. Connect with us at das42.com and stay updated on LinkedIn. Join us today on our journey to help you realize the possibilities of transforming your business through data and analytics.