As cloud data warehouses and lakes have been modernized and broadly adopted, they’ve opened the opportunity to live-query huge amounts of data directly, adding another powerful tool for discovery. But when you use this technique, you can end up with runaway cloud compute costs. And performance is a concern, too.
Rather than using live-query exclusively, you need a data management and analytics approach based on your frequency and latency requirements.
A “heat map” of typical queries could show that the majority of your questions are exploratory; without the need for real-time updates, they can run in-memory. On the other hand, your more coordinated queries may need to hit compute at the data-source level.
On the data integration side, you should be able to choose between continuously updating and merging data (incurring higher compute costs) and doing an aggregate view (with lower costs).
And from an analytics perspective, you should be able to choose between live-query (higher compute costs) and in-memory exploration, which can be both faster and cheaper. If you want to become truly data-driven, both insight velocity and cost-per-insight will increase, and you’ll have to figure out how to run the right queries in the right place.
2020 and 2021 saw a massive increase in the adoption of cloud data warehouses and data lakes.
By 2023, 50% of clients of public cloud services will experience escalating costs and project failures resulting from poor management.12
12 Gartner, Predicts 2021: The Evolution of Infrastructure and Communications Services Intensifies, Published 1 December 2020 - ID G00734923