BigQuery Archives - Unravel

Mastering Cost Management: From Reactive Spending to Proactive Optimization

Unravel Data — Wed, 05 Feb 2025 20:26:50 +0000

According to Forrester, accurately forecasting cloud costs remains a significant challenge for 80% of data management professionals. This struggle often stems from a lack of granular visibility, control over usage, and ability to optimize code and infrastructure for cost and performance. Organizations utilizing modern data platforms like Snowflake, BigQuery, and Databricks often face unexpected budget overruns, missed performance SLAs, and inefficient resource allocation.

Transitioning from reactive spending to proactive optimization is crucial for effective cost management in modern data stack environments.

This shift requires a comprehensive approach that encompasses several key strategies:

1. Granular Visibility
Gain comprehensive insights into expenses by unifying fragmented data and breaking down silos, enabling precise financial planning and resource allocation for effective cost control. This unified approach allows teams to identify hidden cost drivers and inefficiencies across the entire data ecosystem.

By consolidating data from various sources, organizations can create a holistic view of their spending patterns, facilitating more accurate budget forecasting and informed decision-making. Additionally, this level of visibility empowers teams to pinpoint opportunities for optimization, such as underutilized resources or redundant processes, leading to significant cost savings over time.

2. ETL Pipeline Optimization
Design cost-effective pipelines from the outset, implementing resource utilization best practices and ongoing performance monitoring to identify and address inefficiencies. This approach involves carefully architecting ETL processes to minimize resource usage while maintaining optimal performance.

By employing advanced performance tuning techniques, such as optimizing query execution plans and leveraging built-in optimizations, organizations can significantly reduce processing time and associated costs. Continuous monitoring of pipeline performance allows for the early detection of bottlenecks or resource-intensive operations, enabling timely adjustments and ensuring sustained efficiency over time.

3. Intelligent Resource Management
Implement intelligent autoscaling to dynamically adjust resources based on workload demands, optimizing costs in real-time while maintaining performance. Efficiently manage data lake and compute resources to minimize unnecessary expenses during scaling. This approach allows organizations to provision automatically and de-provision resources as needed, ensuring optimal utilization and cost-efficiency.

By setting appropriate scaling policies and thresholds, you can avoid over-provisioning during periods of low demand and ensure sufficient capacity during peak usage times. Additionally, separating storage and compute resources enables more granular control over costs, allowing you to scale each component independently based on specific requirements.

4. FinOps Culture
Foster collaboration between data and finance teams, implementing cost allocation strategies like tagging and chargeback mechanisms to attribute expenses to specific projects or teams accurately. This approach creates a shared responsibility for cloud costs and promotes organizational transparency.

By establishing clear communication channels and regular meetings between technical and financial stakeholders, teams can align their efforts to optimize resource utilization and spending. A robust tagging system also allows for detailed cost breakdowns, enabling more informed decision-making and budget allocation based on actual usage patterns.

5. Advanced Forecasting
Develop sophisticated forecasting techniques and flexible budgeting strategies using historical data and AI-driven analytics to accurately predict future costs and create adaptive budgets that accommodate changing business needs. Organizations can identify trends and seasonal variations that impact costs by analyzing past usage patterns and performance metrics.

This data-driven approach enables more precise resource allocation and helps teams anticipate potential cost spikes, allowing for proactive adjustments to prevent budget overruns. Additionally, implementing AI-powered forecasting models can provide real-time insights and recommendations, enabling continuous optimization of environments as workloads and business requirements evolve.

Mastering these strategies can help you transform your approach to cost management from reactive to proactive, ensuring you maximize the value of your cloud investments while maintaining financial control.

To learn more about implementing these cost management strategies in your modern data environment, join our upcoming webinar series, “Controlling Cloud Costs.” This ten-part series will explore each aspect of effective cost management, providing actionable insights and best practices to gain control over your data platform costs.

The post Mastering Cost Management: From Reactive Spending to Proactive Optimization appeared first on Unravel.

BigQuery Cost Management

Unravel Data — Wed, 06 Nov 2024 16:46:23 +0000

Mastering BigQuery Cost Management and FinOps: A Comprehensive Checklist

Effective cost management becomes crucial as organizations increasingly rely on Google BigQuery for their data warehousing and analytics needs. This checklist delves into the intricacies of cost management and FinOps for BigQuery, exploring strategies to inform, govern, and optimize usage while taking a holistic approach that considers queries, datasets, infrastructure, and more.

While this checklist is comprehensive and very impactful when implemented fully, it can also be overwhelming to implement with limited staffing and resources. AI-driven insights and automation can solve this problem and are also explored at the bottom of this guide.

Understanding Cost Management for BigQuery

BigQuery’s pricing model is primarily based on data storage and query processing. While this model offers flexibility, it also requires careful management to ensure costs align with business value. Effective cost management for BigQuery is about more than reducing expenses—it’s also about optimizing spend, ensuring efficient resource utilization, and aligning costs with business outcomes. This comprehensive approach falls under the umbrella of FinOps (Financial Operations).

The Holistic Approach: Key Areas to Consider

1. Query Optimization

Are queries optimized? Efficient queries are fundamental to cost-effective BigQuery usage:

Query Structure: Write efficient SQL queries that minimize data scanned.
Partitioning and Clustering: Implement appropriate table partitioning and clustering strategies to reduce query costs.
Materialized Views: Use materialized views for frequently accessed or complex query results.
Query Caching: Leverage BigQuery’s query cache to avoid redundant processing.

2. Dataset Management

Are datasets managed correctly? Proper dataset management is crucial for controlling costs:

Data Lifecycle Management: Implement policies for data retention and expiration to manage storage costs.
Table Expiration: Set up automatic table expiration for temporary or test datasets.
Data Compression: Use appropriate compression methods to reduce storage costs.
Data Skew: Address data skew issues to prevent performance bottlenecks and unnecessary resource consumption.

3. Infrastructure Optimization

Is infrastructure optimized? While BigQuery is a managed service, there are still infrastructure considerations:

Slot Reservations: Evaluate and optimize slot reservations for predictable workloads.
Flat-Rate Pricing: Consider flat-rate pricing for high-volume, consistent usage patterns.
Multi-Region Setup: Balance data residency requirements with cost implications of multi-region setups.

4. Access and Governance

Are the right policies and governance in place? Proper access controls and governance are essential for cost management:

IAM Roles: Implement least privilege access using Google Cloud IAM roles.
Resource Hierarchies: Utilize resource hierarchies (organizations, folders, projects) for effective cost allocation.
VPC Service Controls: Implement VPC Service Controls to manage data access and potential egress costs.

Implementing FinOps Practices

To master cost management for BiqQuery, consider these FinOps practices:

1. Visibility and Reporting

Implement comprehensive labeling strategies for resources.
Create custom dashboards in Google Cloud Console or Data Studio for cost visualization.
Set up budget alerts and export detailed billing data for analysis.

2. Optimization

Regularly review and optimize queries based on BigQuery’s query explanation and job statistics.
Implement automated processes to identify and optimize high-cost queries.
Foster a culture of cost awareness among data analysts and engineers.

3. Governance

Establish clear policies for dataset creation, query execution, and resource provisioning.
Implement approval workflows for high-cost operations or large-scale data imports.
Create and enforce organizational policies to prevent costly misconfigurations.

Setting Up Guardrails

Implementing guardrails is crucial to prevent unexpected costs:

Query Limits: Set daily query limit quotas at the project or user level.
Cost Controls: Implement custom cost controls using Cloud Functions and the BigQuery API.
Data Access Controls: Use column-level and row-level security to restrict access to sensitive or high-volume data.
Budgets and Alerts: Set up project-level budgets and alerts in Google Cloud Console.

The Need for Automated Observability and FinOps Solutions

Given the scale and complexity of modern data operations, automated solutions can significantly enhance cost management efforts. Automated observability and FinOps solutions can provide the following:

Real-time cost visibility across your entire BigQuery environment.
Automated recommendations for query optimization and cost reduction.
Anomaly detection to quickly identify unusual spending patterns.
Predictive analytics to forecast future costs and resource needs.

These solutions can offer insights that would be difficult or impossible to obtain manually, helping you make data-driven decisions about your BigQuery usage and costs.

BigQuery-Specific Cost Optimization Techniques

Avoid SELECT: Instead, specify only the columns you need to reduce data processed.
Use Approximate Aggregation Functions: For large-scale aggregations where precision isn’t critical, use approximate functions like APPROX_COUNT_DISTINCT().
Optimize JOIN Operations: Ensure the larger table is on the left side of the JOIN to potentially reduce shuffle and processing time.
Leverage BigQuery ML: Use BigQuery ML for in-database machine learning to avoid data movement costs.
Use Scripting: Utilize BigQuery scripting to perform complex operations without multiple query executions.

Conclusion

Effective BigQuery cost management and FinOps require a holistic approach that considers all aspects of your data operations. By optimizing queries, managing datasets efficiently, leveraging appropriate pricing models, and implementing robust FinOps practices, you can ensure that your BigQuery investment delivers maximum value to your organization.

Remember, the goal isn’t just to reduce costs, but to optimize spend and align it with business objectives. With the right strategies and tools in place, you can transform cost management from a challenge into a competitive advantage, enabling your organization to make the most of BigQuery’s powerful capabilities while maintaining control over expenses.

To learn more about how Unravel can help with BigQuery cost management, request a health check report, view a self-guided product tour, or request a demo.

The post BigQuery Cost Management appeared first on Unravel.

BigQuery Code Optimization

Unravel Data — Wed, 06 Nov 2024 16:43:18 +0000

The Complexities of Code Optimization in BigQuery: Problems, Challenges and Solutions

Google BigQuery offers a powerful, serverless data warehouse solution that can handle massive datasets with ease. However, this power comes with its own set of challenges, particularly when it comes to code optimization.

This blog post delves into the complexities of code optimization in BigQuery, the difficulties in diagnosing and resolving issues, and how automated solutions can simplify this process.

The BigQuery Code Optimization Puzzle

1. Query Performance and Cost Management

Problem: In BigQuery, query performance and cost are intimately linked. Inefficient queries can not only be slow but also extremely expensive, especially when dealing with large datasets.

Diagnosis Challenge: Identifying the root cause of a poorly performing and costly query is complex. Is it due to inefficient JOIN operations, suboptimal table structures, or simply the sheer volume of data being processed? BigQuery provides query explanations, but interpreting these for complex queries and understanding their cost implications requires significant expertise.

Resolution Difficulty: Optimizing BigQuery queries often involves a delicate balance between performance and cost. Techniques like denormalizing data might improve query speed but increase storage costs. Each optimization needs to be carefully evaluated for its impact on both performance and billing, which can be a time-consuming and error-prone process.

2. Partitioning and Clustering Challenges

Problem: Improper partitioning and clustering can lead to excessive data scanning, resulting in slow queries and unnecessary costs.

Diagnosis Challenge: The effects of suboptimal partitioning and clustering may not be immediately apparent and can vary depending on query patterns. Identifying whether poor performance is due to partitioning issues, clustering issues, or something else entirely requires analyzing query patterns over time and understanding the intricacies of BigQuery’s architecture.

Resolution Difficulty: Changing partitioning or clustering strategies is not a trivial operation, especially for large tables. It requires careful planning and can temporarily impact query performance during the restructuring process. Determining the optimal strategy often requires extensive A/B testing and monitoring across various query types and data sizes.

3. Nested and Repeated Fields Complexity

Problem: While BigQuery’s support for nested and repeated fields offers flexibility, it can lead to complex queries that are difficult to optimize and prone to performance issues.

Diagnosis Challenge: Understanding the performance characteristics of queries involving nested and repeated fields is like solving a multidimensional puzzle. The query explanation may not provide clear insights into how these fields are being processed, making it difficult to identify bottlenecks.

Resolution Difficulty: Optimizing queries with nested and repeated fields often requires restructuring the data model or rewriting queries in non-intuitive ways. This process can be time-consuming and may require significant changes to ETL processes and downstream analytics.

4. UDF and Stored Procedure Performance

Problem: User-Defined Functions (UDFs) and stored procedures in BigQuery can lead to unexpected performance issues if not implemented carefully.

Diagnosis Challenge: The impact of UDFs and stored procedures on query performance isn’t always clear from the query explanation. Identifying whether these are the source of performance issues requires careful analysis and benchmarking.

Resolution Difficulty: Optimizing UDFs and stored procedures often involves rewriting them from scratch or finding ways to eliminate them altogether. This can be a complex process, especially if these functions are widely used across your BigQuery projects.

Manual Optimization Struggle

Traditionally, addressing these challenges involves a cycle of:

1. Manually analyzing query explanations and job statistics
2. Conducting time-consuming A/B tests with different query structures and table designs
3. Carefully monitoring the impact of changes on both performance and cost
4. Continuously adjusting as data volumes and query patterns evolve

This process is not only time-consuming but also requires deep expertise in BigQuery’s architecture, SQL optimization techniques, and cost management strategies. Even then, optimizations that work today might become inefficient as your data and usage patterns change.

Harnessing Automation for BigQuery Optimization

Given the complexities and ongoing nature of these challenges, many organizations are turning to automated solutions to streamline their BigQuery optimization efforts. Tools like Unravel can help by:

Continuous Performance and Cost Monitoring: Automatically tracking query performance, resource utilization, and cost metrics across your entire BigQuery environment.

Intelligent Query Analysis: Using machine learning algorithms to identify patterns and anomalies in query performance and cost that might be missed by manual analysis.

Root Cause Identification: Quickly pinpointing the source of performance issues, whether they’re related to query structure, data distribution, or BigQuery-specific features like partitioning and clustering.

Optimization Recommendations: Providing actionable suggestions for query rewrites, partitioning and clustering strategies, and cost-saving measures.

Impact Prediction: Estimating the potential performance and cost impacts of suggested changes before you implement them.

Automated Policy Enforcement: Helping enforce best practices and cost controls automatically across your BigQuery projects.

By leveraging such automated solutions, data teams can focus their expertise on deriving insights from data while ensuring their BigQuery environment remains optimized and cost-effective. Instead of spending hours digging through query explanations and job statistics, teams can quickly identify and resolve issues, or even prevent them from occurring in the first place.

Conclusion

Code optimization in BigQuery is a complex, ongoing challenge that requires continuous attention and expertise. While the problems are multifaceted and the manual diagnosis and resolution process can be daunting, automated solutions offer a path to simplify and streamline these efforts. By leveraging such tools, organizations can more effectively manage their BigQuery performance and costs, improve query efficiency, and allow their data teams to focus on delivering value rather than constantly grappling with optimization challenges.

Remember, whether you’re using manual methods or automated tools, optimization in BigQuery is an ongoing process. As your data volumes grow and query patterns evolve, staying on top of performance and cost management will ensure that your BigQuery implementation continues to deliver the insights your business needs, efficiently and cost-effectively.

To learn more about how Unravel can help with BigQuery code optimization, request a health check report, view a self-guided product tour, or request a demo.

The post BigQuery Code Optimization appeared first on Unravel.

Configuration Management in Modern Data Platforms

Unravel Data — Wed, 06 Nov 2024 16:38:22 +0000

Navigating the Maze of Configuration Management in Modern Data Platforms: Problems, Challenges and Solutions

In the world of big data, configuration management is often the unsung hero of platform performance and cost-efficiency. Whether you’re working with Snowflake, Databricks, BigQuery, or any other modern data platform, effective configuration management can mean the difference between a sluggish, expensive system and a finely-tuned, cost-effective one.

This blog post explores the complexities of configuration management in data platforms, the challenges in optimizing these settings, and how automated solutions can simplify this critical task.

The Configuration Conundrum

1. Cluster and Warehouse Sizing

Problem: Improper sizing of compute resources (like Databricks clusters or Snowflake warehouses) can lead to either performance bottlenecks or unnecessary costs.

Diagnosis Challenge: Determining the right size for your compute resources is not straightforward. It depends on workload patterns, data volumes, and query complexity, all of which can vary over time. Identifying whether performance issues or high costs are due to improper sizing requires analyzing usage patterns across multiple dimensions.

Resolution Difficulty: Adjusting resource sizes often involves a trial-and-error process. Too small, and you risk poor performance; too large, and you’re wasting money. The impact of changes may not be immediately apparent and can affect different workloads in unexpected ways.

2. Caching and Performance Optimization Settings

Problem: Suboptimal caching strategies and performance settings can lead to repeated computations and slow query performance.

Diagnosis Challenge: The effectiveness of caching and other performance optimizations can be highly dependent on specific workload characteristics. Identifying whether poor performance is due to cache misses, inappropriate caching strategies, or other factors requires deep analysis of query patterns and platform-specific metrics.

Resolution Difficulty: Tuning caching and performance settings often requires a delicate balance. Aggressive caching might improve performance for some queries while causing staleness issues for others. Each adjustment needs to be carefully evaluated across various workload types.

3. Security and Access Control Configurations

Problem: Overly restrictive security settings can hinder legitimate work, while overly permissive ones can create security vulnerabilities.

Diagnosis Challenge: Identifying the root cause of access issues can be complex, especially in platforms with multi-layered security models. Is a performance problem due to a query issue, or is it because of an overly restrictive security policy?

Resolution Difficulty: Adjusting security configurations requires careful consideration of both security requirements and operational needs. Changes need to be thoroughly tested to ensure they don’t inadvertently create security holes or disrupt critical workflows.

4. Cost Control and Resource Governance

Problem: Without proper cost control measures, data platform expenses can quickly spiral out of control.

Diagnosis Challenge: Understanding the cost implications of various platform features and usage patterns is complex. Is a spike in costs due to inefficient queries, improper resource allocation, or simply increased usage?

Resolution Difficulty: Implementing effective cost control measures often involves setting up complex policies and monitoring systems. It requires balancing cost optimization with the need for performance and flexibility, which can be a challenging trade-off to manage.

The Manual Configuration Management Struggle

Traditionally, managing these configurations involves:

1. Continuously monitoring platform usage, performance metrics, and costs
2. Manually adjusting configurations based on observed patterns
3. Conducting extensive testing to ensure changes don’t negatively impact performance or security
4. Constantly staying updated with platform-specific best practices and new features
5. Repeating this process as workloads and requirements evolve

This approach is not only time-consuming but also reactive. By the time an issue is noticed and diagnosed, it may have already impacted performance or inflated costs. Moreover, the complexity of modern data platforms means that the impact of configuration changes can be difficult to predict, leading to a constant cycle of tweaking and re-adjusting.

Embracing Automation in Configuration Management

Given these challenges, many organizations are turning to automated solutions to manage and optimize their data platform configurations. Platforms like Unravel can help by:

Continuous Monitoring: Automatically tracking resource utilization, performance metrics, and costs across all aspects of the data platform.

Intelligent Analysis: Using machine learning to identify patterns and anomalies in platform usage and performance that might indicate configuration issues.

Predictive Optimization: Suggesting configuration changes based on observed usage patterns and predicting their impact before implementation.

Automated Adjustment: In some cases, automatically adjusting configurations within predefined parameters to optimize performance and cost.

Policy Enforcement: Helping to implement and enforce governance policies consistently across the platform.

Cross-Platform Optimization: For organizations using multiple data platforms, providing a unified view and consistent optimization approach across different environments.

By leveraging automated solutions, data teams can shift from a reactive to a proactive configuration management approach. Instead of constantly fighting fires, teams can focus on strategic initiatives while ensuring their data platforms remain optimized, secure, and cost-effective.

Conclusion

Configuration management in modern data platforms is a complex, ongoing challenge that requires continuous attention and expertise. While the problems are multifaceted and the manual management process can be overwhelming, automated solutions offer a path to simplify and streamline these efforts.

By embracing automation in configuration management, organizations can more effectively optimize their data platform performance, enhance security, control costs, and free up their data teams to focus on extracting value from data rather than endlessly tweaking platform settings.

Remember, whether using manual methods or automated tools, effective configuration management is an ongoing process. As your data volumes grow, workloads evolve, and platform features update, staying on top of your configurations will ensure that your data platform continues to meet your business needs efficiently and cost-effectively.

To learn more about how Unravel can help manage and optimize your data platform configurations with Databricks, Snowflake, and BigQuery: request a health check report, view a self-guided product tour, or request a demo.

The post Configuration Management in Modern Data Platforms appeared first on Unravel.

Unravel Data Achieves Google Cloud Ready

Unravel Data — Tue, 09 Apr 2024 15:46:37 +0000

Unravel Data Achieves Google Cloud Ready – BigQuery Designation, Delivering AI-driven Cost and Performance Optimization to BigQuery Users

Purpose-built AI provides real-time data observability and FinOps for BigQuery performance and data cost governance

PALO ALTO, CA — Apr. 9, 2024 – Unravel Data, the first AI-enabled data observability and FinOps platform built to optimize the cost, speed, and scale of data pipelines and applications running on cloud data platforms, today announced that Unravel for BigQuery has achieved Google Cloud Ready – BigQuery designation. In doing so, Unravel has demonstrated that its product has met a core set of functional and interoperability requirements that allow for robust integration with BigQuery.

Unravel for BigQuery provides purpose-built, AI-driven cost and performance optimization insights and prescriptive recommendations for maximizing cost efficiencies and speed to value of data and AI pipelines moving through BigQuery. With Unravel, BigQuery users have a means to optimize SQL queries and maximize cloud data investments for greater return and select the BigQuery edition that best suits their situation, based on their unique compute, storage, and data usage patterns. With the Google Cloud Ready – BigQuery designation, BigQuery users have validation that Unravel has been proven through a rigorous validation process to work optimally with BigQuery, securely and at scale.

“Companies are faced with a rapid race to leverage AI and data products across all areas of the business. This means that in addition to having unprecedented amounts of data to manage, they also have more users of data, and more people of varying skill levels creating models and products. As a result, data teams are increasingly faced with more bottlenecks due to broken data and AI pipelines and inefficient data processing and a vast impact on escalating cloud data bills,” said Kunal Agarwal, CEO and co-founder, Unravel Data. “We’re extremely excited, therefore, to be part of the Google Cloud Ready – BigQuery program as now even more companies can be assured of the right unit costs for their AI and data projects, thus speeding time to value.”

At the core of Unravel Data’s platform is its AI-powered Recommendation Engine which has been trained to understand all the intricacies and complexities of modern data platforms and the supporting infrastructure. Built to ingest and interpret the continuous millions of ongoing data streams, the Engine provides real-time insights into application and system performance, as well as recommendations to optimize costs, including right-sizing instances and applying code recommendations for performance and financial efficiencies.

Unravel provides deep observability of the job, user, and code level, providing BigQuery customers with insights and AI-driven cost optimization recommendations for slots and SQL queries. Moreover, real-time cost visibility, predictive spend forecasting, and performance insights for workloads mean that BigQuery users are able to accelerate their cloud transformation initiatives. With easy-to-use widgets that offer insights on spend, performance, and unit economics, BigQuery users can also customize their dashboards and alerts.

Learn how Unravel and Google Cloud BigQuery can help enterprises optimize their cloud data spend and increase ROI here.

About Unravel Data
Unravel Data radically transforms the way businesses understand and optimize the performance and cost of their modern data and applications – and the complex data and AI pipelines that power those applications. Unravel’s market-leading data observability and FinOps platform with purpose-built AI for each data platform, provides actionable recommendations needed for cost and performance data and AI pipeline efficiencies. A recent winner of the Best Data Tool & Platform of 2023 as part of the annual SIIA CODiE Awards, some of the world’s most recognized brands like Maersk, Mastercard, Charter Communications, Equifax, and Deutsche Bank rely on Unravel Data to unlock data-driven insights and deliver new innovations to market. To learn more, visit https://www.unraveldata.com.

The post Unravel Data Achieves Google Cloud Ready appeared first on Unravel.

Understanding BigQuery Cost

Stephen Lamont — Thu, 22 Feb 2024 17:49:34 +0000

This article explains the two pricing models—on-demand and capacity—for the compute component of the BigQuery pricing, the challenges of calculating chargeback on compute, and what Unravel can do.

On-demand compute pricing: You are charged based on the billed bytes used by your queries. So if the price is $X/TiB and a query uses Y-TiB of billed bytes, you will be billed $(X*Y) for that query.
Capacity compute pricing: You buy slots and you are charged based on the number of slots and the time for which slots were made available.

The following section describes compute pricing in more detail.

Capacity Pricing

To use capacity pricing, you start by creating reservations. You can create one or more reservations inside an admin project. All the costs related to the slots for the reservations will be attributed to the admin project in the bill.

Reservations

While creating a reservation, you need to specify “baseline slots” and “max slots,” in multiples of 100. You will be charged for the baseline slots at the minimum for the duration of the reservation. When all the baseline slots have been utilized by queries running in that reservation, more slots can be made available via autoscaling. Autoscaling happens in multiple of 100 slots. The number of slots available for autoscaling is (max slots – baseline slots).

You can modify the baseline and max slots after you create the reservation. You can increase the values at any point in time, but you cannot decrease the values within 1 hour of creating a reservation or updating the reservation.

Assignments

After you have created a reservation, to enable the queries in projects to use slots from that reservation, you have to create assignments. You can assign projects, folders ,or organizations to a reservation. Whenever a query is started, it will first try to use the slots from this reservation. You can create or delete assignments at any point in time.

Pricing

BigQuery provides 3 editions —Standard, Enterprise, and Enterprise Plus. These editions have different capabilities and different pricing rates. The rates are defined in terms of slot hours. For example, the rate is $0.04 per slot hour for Standard edition in the US and $0.06 per slot hour for Enterprise edition in the same region.

In capacity pricing, you are charged for the number of slots made available and the time for which slots are made available. Suppose you have a reservation with 100 baseline slots and 500 max slots in the Standard edition. Consider the following usage:

In the first hour, no queries are running, so the slot requirement is 0.
In the second hour, there are queries running, but the slot requirement is less than 100.
In the third hour, more queries are running and the slot requirement is 150.

In the first 2 hours, even though the slot requirement is less than 100, you will still be charged for 100 slots—the baseline slots—for each of the first 2 hours.

In the third hour, we need 50 more slots than the baseline, so autoscaling kicks in to provide more slots. Since autoscaling only scales up or down in multiples of 100, 100 more slots are added. Hence, a total of 200 slots (100 baseline + 100 from autoscaling) are made available in this hour.

The number of slot-hours from this 3-hour period is 100 + 100 + 200 = 400. With a rate of $0.04 per slot-hour for Standard edition, you will be charged 0.04*400 = $16 for this usage.

Pay-as-you-go

Recall that you can create/delete/update reservations and baseline/max slots whenever you want. Also, you will be charged for just the number of slots made available to you and for the time the slots are made available. This model is called pay-as-you-go, as you are paying for the usage you are using.

Capacity Commitment

If you expect to use a certain number of slots over a long period of time, you can make a commitment of X slots over a 1-year or 3-year period, for a better rate. You will be charged for those slots for the entire period regardless of whether you use them or not. This model is called capacity commitment.

Consider the following example of capacity commitment. Let’s say you have:

1-year commitment of 1600 slots in the Enterprise edition.
Created 1 reservation with max size of 1500 slots and baseline of 1000 slots.
Hence your autoscaling slots are 1500-1000 = 500.
Pay-as-you-go price for enterprise edition is $0.06 per slot-hour.
1-year commitment price for the enterprise edition is $0.048 per slot hour.

Consider this scenario:

In the first hour, the requirement is less than 1000 slots.
In the second hour, the requirement is 1200 slots.
In the third hour, the requirement is 1800 slots.

In the first hour, the baseline slots of 1000 are made available for the reservation; these slots are available from the commitment slots. Since we have a commitment of 1600 slots, all the 1600 slots are actually available. The 1000 slots are available for the reservation as baseline. The remaining 600 are called idle slots and are also charged. So for the first hour, we are charged for 1600 slots as per commitment price, with a cost of $(1600 * 0.048).

In the second hour, since the requirement is 1200 slots, there is an additional requirement of 200 slots beyond the baseline of 1000 slots. Since 600 idle slots are available from the committed capacity, the additional requirement of 200 slots will come from these idle slots, while the remaining 400 slots will remain idle. Notice that autoscaling was not needed in this case. Before going for autoscaling, BigQuery will try to use idle slots (unless ignore_idle_slots config is set to True for that reservation). So how much are we charged for the second hour? The answer is 1600 slots, since that is what is committed. These 1600 slots are charged as per the commitment price, so the cost for the second hour is $(1600 * 0.048).

In the third hour, the requirement is 1800 slots: the first 1600 slots will come from commitment slots, and the other 200 will now come from autoscaling slots. The 1600 slots will be charged as per 1-year commit pricing, and the 200 slots coming from autoscale slots will be charged as per pay-as-you-go pricing at $0.06/slot-hour in this case. Therefore, the cost for the third hour is $((1600 * 0.048) + (200 * 0.06)).

Notes

Some points to note regarding capacity pricing:

The slots are charged with a maximum granularity of 1 second, and the charge is for a minimum of 1 minute.
Autoscaling always happens in increments/decrements of 100 slots.
Queries running in a reservation automatically use idle slots from other reservations within the same admin project. Unless ignore_idle_slots is set to True for the reservation.
The capacity commitment is specific to a region, organization, and edition. The idle slots can’t be shared across regions or editions.
Idle slot capacity is not shared between reservations in different admin projects.

The Cost Chargeback Problem

In an organization, typically there are one or more capacity commitments, one or more admin projects, and multiple reservations in these admin projects. GCP provides billing data that gives you the hourly reservation costs for a given admin project, edition, and location combination. However, there are multiple possible ways to map a team to a reservation: a team can be assigned to one reservation, multiple teams can be assigned to the same reservation, or multiple teams can be assigned to multiple reservations.

In any case, it is a tricky task to find out which team or user is contributing how much to your cost. How do you map the different teams and users to the different projects, editions, and locations? And how do you then track the cost incurred by these teams and users? Common chargeback approaches—such as chargeback by accounts, projects, and reservations—simply cannot provide clarity at the user or team level. There is also no direct data source that gives this information from GCP.

Unravel provides cost chargeback at the user and team levels by combining data from different sources such as billing, apps, and tags. The crux of our proprietary approach is providing an accurate cost estimate at the query level. We then associate the query costs with users and teams (via tags) to derive the user-level or team-level chargeback.

Computing query-level cost estimates involves addressing a number of challenges. Some of the challenges include:

A query may have different stages where chargeback could be different.
A query may use slots with commitment pricing or pay-as-you-go pricing.
Capacity is billed at one minute minimum.
Autoscaling increments by multiples of 100 slots.
Chargeback policy for idle resource.

Let’s understand these challenges further with a few scenarios. In all the examples below, we assume that we use the Enterprise edition in the US region, with rates of $0.06/slot-hour for pay-as-you-go, and $0.048/slot-hour for a 1-year commitment.

Scenario 1: Slot-hours billed differently at different stages of a query

Looking at the total slot-hours for chargeback could be misleading because the slot-hours at different stages of a query may be billed differently.

Consider the following scenario:

A reservation with a baseline of 0 slot and max 200 slots.
No capacity commitment.
Query Q1 runs from 5am to 6am with 200 slots for the whole hour.
Query Q2 runs from 6am to 8am in 2 stages:
- In the first stage, from 6am to 7am, it uses 150 slots for the whole hour.
- In the second stage, from 7am to 8am, it uses 50 slots for the whole hour.

Both queries use the exact same slot-hours total, i.e., 200 slot-hours, and use the same reservation and edition. Hence we may think the chargeback to both queries should be the same.

But if you look closely, the two queries do not incur the same amount of cost.

Q1 uses 200 slots for 1 hour. Given the reservation with a baseline of 0 slot and a max of 200 slots, 200 slots are made available in this hour, and the cost of the query is $(200*0.06) = $12.

In contrast, Q2’s usage is split into 150 slots for the first hour and 50 slots for the second hour. Since slots are autoscaled in increments of 100, to run Q2, 200 slots are made available in the first hour and 100 slots are made available in the second hour. The total slot-hours for Q2 is therefore 300, and the cost is $(300*0.06) = $18.

Summary: The cost chargeback to a query needs to account for how many slots are used in different stages of the query and not just the total slot-hours (or slot-ms) used.

Scenario 2: Capacity commitment pricing vs. pay-as-you-go pricing from autoscaling

At different stages, a query may use slots from capacity commitment or from autoscaling that are charged at the pay-as-you-go price.

Consider the following scenario:

1 reservation with a baseline of 100 slots and max slots as 300 slots.
1-year capacity commitment of 100 slots.
Query Q1 runs from 5am to 6am and uses 300 slots for the whole 1 hour.
Query Q2 runs from 6am to 8am in 2 stages. It uses 100 slots from 6am to 7am, and uses 200 slots from 7am to 8am.

Once again, the total slot-hours for both queries are the same, i.e., 300 slot hours, and we might chargeback the same cost to both queries.

But if you look closely, the queries do not incur the same amount of cost.

For Q1, 100 slots come from committed capacity and are charged at the 1-year commit price ($0.048/slot-hour), whereas 200 are autoscale slots that are charged at the pay-as-you-go price ($0.06/slot-hour). So the cost of Q1 is $((100*0.048) + (200*0.06)) = $16.80.

For Q2, from 6-7am, 100 slots come from committed capacity and are charged at the 1-year commit price ($0.048/slot-hour), so the cost for 6-7am is = $(100*0.048) = $4.80.

From 7-8am, 100 slots from committed capacity are charged at 1-year commit price ($0.048/slot-hour), and the other 100 are autoscale slots charged at pay-as-you-go price ($0.06/slot-hour). So the cost from 7-8am is = $(100*0.048) + (100*0.06) = $10.80.

Hence the cost between 6-8am (duration when Query-2 is running) is = $4.80 + $10.80 = $15.60.

Summary: The cost chargeback to a query needs to account for whether the slots come from committed capacity or from autoscaling charged at pay-as-you-go price. A query may use both at different stages.

Scenario 3: Minimum slot capacity and increment in autoscaling

A query may be billed for more than the resource it actually needs because of the minimum slot capacity and the minimum increment in autoscaling.

1 reservation with a baseline of 0 slots and max slots as 300 slots.
No capacity commitment.
Query Q1 uses 50 slots for 10 seconds between 05:00:00 to 05:00:10.
There is no other query running between 04:59:00 to 05:02:00.

If you were to chargeback by slot-ms, you would say that the query uses 50 slots for 10 seconds, or 500,000 slot-ms.

However, this assumption is flawed because of these two conditions:

Slot capacity is billed for a minimum of 1 minute before being billed per second.
Autoscaling happens in increments of 100 slots.

For Q1, 100 slots (not 50) are actually made available, for 1 minute (60,000 ms) and hence you will be charged for 6,000,000 slot-ms in your bill.

Summary: The cost chargeback needs to account for minimum slot capacity and autoscaling increments.

Scenario 4: Chargeback policy for idle resource

In the previous scenario, we see that a query that actually uses 500,000 slot-ms is billed for 6,000,000 slot-ms. Here we make the assumption that whatever resource is made available but not used is also included in the chargeback of the queries running at the same time. What happens if there are multiple queries running concurrently, with unused resources? Continuing with the example in Scenario 3, if there is another query, Q2, that uses 50 slots for 30s, from 05:00:10 to 05:00:40, then:

Q1 still uses 500,000 slot-ms like before.
Q2 uses 1,500,000 slot-ms.
The total bill remains 6,000,000 slot-ms as before, because slot capacity is billed for a minimum of 1 min and autoscaling increments by 100 slots.

There are several ways to consider the chargeback to Q1 and Q2:

Charge each query by its actual slot-ms, and have a separate “idle” category. In this case, Q1 is billed for 500,000 slot-ms, Q2 is billed for 1,500,000 slot-ms, and the remaining 4,000,000 slot-ms is attributed to the “idle” category.

Divide idle resources equally among the queries. In this case, Q1 is billed 2,500,000 slot-ms, and Q2 is billed 3,500,000 slot-ms.

Divide idle resources proportionally among the queries based on the queries’ usage. In this case, Q1 uses 1,833,333 slot-ms, while Q2 uses 4,166,667 slot-ms.

Summary: Chargeback policy needs to consider how to handle idle resources. Without a clear policy, there could be mismatches between users’ assumptions and the implementation, even leading to inconsistencies, such as the sum of the queries costs deviating from the bill. Moreover, different organizations may prefer different chargeback policies, and there’s no one-size-fits-all approach.

Conclusion

To conclude, providing accurate and useful chargeback for an organization’s usage of BigQuery presents a number of challenges. The common approaches of chargeback by accounts, reservations, and projects are often insufficient for most organizations, as they need user-level and team-level chargeback. However, chargeback by users and teams require us to be able to provide query-level cost estimates, and then aggregate by users and teams (via tags). Computing the query-level cost estimate is another tricky puzzle where simply considering the total slot usage of a query will not work. Instead, we need to consider various factors such as different billing for different stages of the same query, commitment pricing cs. pay-as-you-go pricing from autoscaling, minimum slot capacity and minimum autoscaling increments, and idle policy.

Fortunately, Unravel has information for all the pieces of the puzzle. Its proprietary algorithm intelligently combines these pieces of information and considers the scenarios discussed. Unravel recognizes that chargeback often doesn’t have a one-size-fits-all approach, and can work with customers to adapt its algorithm to specific requirements and use cases.

The post Understanding BigQuery Cost appeared first on Unravel.

Harnessing Google Cloud BigQuery for Speed and Scale: Data Observability, FinOps, and Beyond

Stephen Lamont — Thu, 10 Aug 2023 12:05:21 +0000

Data is a powerful force that can generate business value with immense potential for businesses and organizations across industries. Leveraging data and analytics has become a critical factor for successful digital transformation that can accelerate revenue growth and AI innovation. Data and AI leaders enable business insights, product and service innovation, and game-changing technology that helps them outperform their peers in terms of operational efficiency, revenue, and customer retention, among other key business metrics. Organizations that fail to harness the power of data are at risk of falling behind their competitors.

Despite all the benefits of data and AI, businesses face common challenges.

Unanticipated cloud data spend

Last year, over $16 billion was wasted in cloud spend. Data management is the largest and fastest-growing category of cloud spending, representing 39% of the typical cloud bill. Gartner noted that in 2022, 98% of the overall database management system (DBMS) market growth came from cloud-based database platforms. Cloud data costs are often the most difficult to predict due to fluctuating workloads. 82% of 157 data management professionals surveyed by Forrester cited difficulty predicting data-related cloud costs. On top of the fluctuations that are inherent with data workloads, a lack of visibility into cloud data spend makes it challenging to manage budgets effectively.

Fluctuating workloads: Google Cloud BigQuery data processing and storage costs are driven by the amount of data stored and analyzed. With varying workloads, it becomes challenging to accurately estimate the required data processing and storage costs. This unpredictability can result in budget overruns that affect 60% of infrastructure and operations (I&O) leaders.
Unexpected expenses: Streaming data, large amounts of unstructured and semi-structured data, and shared slot pool consumption can quickly drive up cloud data costs. These factors contribute to unforeseen spikes in usage that may catch organizations off guard, leading to unexpected expenses on their cloud bills.
Lack of visibility: Without granular visibility into cloud data analytics billing information, businesses have no way to accurately allocate costs down to the job or user level. This makes it difficult for them to track usage patterns and identify areas where budgets will be over- or under-spent, or where performance and cost optimization are needed.

By implementing a FinOps approach, businesses can gain better control over their cloud data spend, optimize their budgets effectively, and avoid unpleasant surprises when it comes time to pay the bill.

Budget and staff constraints limit new data workloads

In 2023, CIOs are expecting an average increase of only 5.1% in their IT budgets, which is lower than the projected global inflation rate of 6.5%. Economic pressures, scarcity and high cost of talent, and ongoing supply challenges are creating urgency to achieve more value in less time.

Limited budget and staffing resources can hinder the implementation of new data workloads. For example, “lack of resources/knowledge to scale” is the leading reason preventing IoT data deployments. Budget and staffing resources constraints pose real risks to launching profitable data and AI projects.

Exponential data volume growth for AI

The rapid growth of disruptive technologies such as generative AI, has led to an exponential increase in cloud computing data volumes. However, managing and analyzing massive amounts of data poses significant challenges for organizations.

Data is foundational for AI and much of it is unstructured, yet IDC found most unstructured data is not leveraged by organizations. A lack of production-ready data pipelines for diverse data sources was the second most cited reason (31%) for AI project failure.

Data pipeline failures slow innovation

Data pipelines are becoming increasingly complex, increasing the Mean Time To Repair (MTTR) breaks and delays. Time is a critical factor that pulls skilled and valuable talent into unproductive firefighting. The more time they spend dealing with pipeline issues or failures, the greater the impact on productivity and new innovation.

Manually testing and running release process checklists are heavy burdens for new and growing data engineering teams. With all of the manual toil, it is no surprise that over 70% of data projects in manufacturing stall at Proof of Concept (PoC) stage and do not see sustainable value realization.

Downtime resulting from pipeline disruptions can have a significant negative impact on the service level agreements (SLAs). It not only affects the efficiency of data processing, but also impacts downstream tasks like analysis and reporting. These slowdowns directly affect the ability of team members and business leaders to make timely decisions based on data insights.

Conclusion

Unravel 4.8.1 for BigQuery provides improved visibility to accelerate performance, boost query efficiency, allocate costs, and accurately predict spend. This launch aligns with the recent BigQuery pricing model change. With Unravel for BigQuery, customers can easily choose the best pricing plan to match their usage. Unravel helps you optimize your workloads and get more value from your cloud data investments.

The post Harnessing Google Cloud BigQuery for Speed and Scale: Data Observability, FinOps, and Beyond appeared first on Unravel.

Unravel Data Launches Cloud Data Cost Observability and Optimization for Google Cloud BigQuery

Stephen Lamont — Thu, 10 Aug 2023 12:04:34 +0000

New Functionality Delivers FinOps, AI-driven Cloud Cost Management and Performance Optimization for BigQuery Users

PALO ALTO, CA — August 10, 2023 – Unravel Data, the first AI-enabled data observability and FinOps platform built to address the speed and scale of modern data platforms, today announced the release of Unravel 4.8.1, enabling Google Cloud BigQuery customers to see and better manage their cloud data costs by understanding specific cost drivers, allocation insights, and performance and cost optimization of SQL queries. This launch comes on the heels of the recent BigQuery pricing model change that replaced flat-rate and flex slot pricing with three new pricing tiers, and will help BigQuery customers to implement FinOps in real time to select the right new pricing plan based on their usage, and maximize workloads for greater return on cloud data investments.

As today’s enterprises implement artificial intelligence (AI) and machine learning (ML) models to continually garner more business value from their data, they are experiencing exploding cloud data costs, with a lack of visibility into cost drivers and a lack of control for managing and optimizing their spend. As cloud costs continue to climb, managing cloud spend remains a top challenge for global business leaders. Data management services are the fastest-growing category of cloud service spending, representing 39% of the total cloud bill. Unravel 4.8.1 enables visibility into BigQuery compute and storage spend and provides cost optimization intelligence using its built-in AI to improve workload cost efficiency.

Unravel’s AI-driven cloud cost optimization for BigQuery delivers insights based on Unravel’s deep observability of the job, user, and code level to supply AI-driven cost optimization recommendations for slots and SQL queries, including slot provisioning, query duration, autoscaling efficiencies, and more. With Unravel, BigQuery users can speed cloud transformation initiatives by having real-time cost visibility, predictive spend forecasting, and performance insights for their workloads. BigQuery customers can also use Unravel to customize dashboards and alerts with easy-to-use widgets that offer insights on spend, performance, and unit economics.

“As AI continues to drive exponential data usage, companies are facing more problems with broken pipelines and inefficient data processing which slows time to business value and adds to the exploding cloud data bills. Today, most organizations do not have the visibility into cloud data spend or ways to optimize data pipelines and workloads to lower spend and mitigate problems,” said Kunal Agarwal, CEO and co-founder, Unravel Data. “With Unravel’s built-in AI, BigQuery users have data observability and FinOps in one solution to increase data pipeline reliability and cost efficiency so that businesses can bring even more workloads to the cloud for the same spend.”

“Enterprises are increasingly concerned about lack of visibility into and control of their cloud-related costs, especially for cloud-based analytics projects,” says Kevin Petrie, VP of Research at The Eckerson Group. “By implementing FinOps programs, they can predict, measure, monitor, optimize and account for cloud-related costs related to data and analytics projects.”

At the core of Unravel Data’s platform is its AI-powered Insights Engine, purpose-built for data platforms, which understands all the intricacies and complexities of each modern data platform and the supporting infrastructure to optimize efficiency and performance. The Insights Engine ingests and interprets the continuous millions of ongoing metadata streams to provide real-time insights into application and system performance, and recommendations to optimize costs and performance for operational and financial efficiencies.

Unravel 4.8.1 includes additional features, such as:

Recommendations for baseline and max setting for reservations
Scheduling insights for recurring jobs
SQL insights and anti-patterns
Recommendations for custom quotas for projects and users
Top-K projects, users, and jobs
Showback by compute and storage types, services, pricing plans, etc.
Chargeback by projects and users
Out-of-the-box and custom alerts and dashboards
Project/Job views of insights and details
Side-by-side job comparisons
Data KPIs, metrics, and insights such as size and number of tables and partitions, access by jobs, hot/warm/cold tables

To learn more on how we are helping BigQuery customers optimize their data cost and management, or to partner with Unravel Data, please visit https://www.unraveldata.com/google-cloud-bigquery/.

About Unravel Data

Unravel Data radically transforms the way businesses understand and optimize the performance and cost of their modern data applications – and the complex data pipelines that power those applications. Providing a unified view across the entire data stack, Unravel’s market-leading data observability platform leverages AI, machine learning, and advanced analytics to provide modern data teams with the actionable recommendations they need to turn data into insights. A recent winner of the Best Data Tool & Platform of 2023 as part of the annual SIIA CODiE Awards, some of the world’s most recognized brands like Adobe, Maersk, Mastercard, Equifax, and Deutsche Bank rely on Unravel Data to unlock data-driven insights and deliver new innovations to market. To learn more, visit https://www.unraveldata.com.

Media Contact

Blair Moreland

ZAG Communications for Unravel Data

unraveldata@zagcommunications.com

The post Unravel Data Launches Cloud Data Cost Observability and Optimization for Google Cloud BigQuery appeared first on Unravel.

Announcing Unravel 4.8.1: Maximize business value with Google Cloud BigQuery Editions pricing

Stephen Lamont — Thu, 10 Aug 2023 12:03:50 +0000

Google recently introduced significant changes to its existing BigQuery pricing models, affecting both compute and storage. They announced the end of sale for flat-rate and flex slots for all BigQuery customers not currently in a contract. Google announced an increase to the price of on-demand analysis by 25% across all regions, starting on July 5, 2023.

Main Components of BigQuery Pricing

Understanding the pricing structure of BigQuery is crucial to effectively manage expenses. There are two big components to BigQuery pricing:

Compute (analysis) pricing is the cost to process queries, including SQL queries, user-defined functions, scripts, and certain data manipulation language (DML) and data definition language (DDL) statements
Storage pricing is the cost to store data that you load into BigQuery. Storage options are logical (the default) or physical. If data storage is converted from logical to physical, customers cannot go back to logical storage.

Selecting the appropriate edition and accurately forecasting data processing needs is essential to cloud data budget planning and maximizing the value derived from Google Cloud BigQuery.

Introducing Unravel 4.8.1 for BigQuery

Unravel 4.8.1 for BigQuery includes AI-driven FinOps and performance optimization features and enhancements, empowering Google Cloud BigQuery customers to see and better manage their cloud data costs. Unravel helps users understand specific cost drivers, allocation insights, and performance and cost optimization of SQL queries. New Unravel features are align with the FinOps phases:

Inform

Compute and storage costs
Unit costs and trends for projects, users, and jobs

Optimize

Reservation insights
SQL insights
Data and storage insights
Scheduling insights

Operate

OpenSearch-based alerts on job duration and slot-ms
Alert customization: ability to create custom alerts

Improving visibility, optimizing data performance, and automating spending guardrails can help organizations overcome resource limitations to get more out of their existing data environments.

Visibility into BigQuery compute and storage spend

Getting insights into your cloud data spending starts with understanding your cloud bill. With Unravel, BigQuery users can see their overall spend as well as spending trends for their selected time window, such as the past 30 days.

The cost dashboard shows details and trends, including compute, storage, and services by pricing tier, project, job, and user.

Unravel provides cost analysis, including the average cost of both compute and storage per project, job, and user over time. Compute spending can be further split between on-demand and reserved capacity pricing.

Armed with this detail, BigQuery customers can better understand both infrastructure and pricing tier usage as well as efficiencies by query, user, department, and project. This granular visibility enables accurate, precise cost allocation, trend visualization, and forecasting.

This dashboard provides BigQuery project chargeback details and trends, including a breakdown by compute and storage tier.

With Unravel, BigQuery users can speed cloud transformation initiatives by having real-time cost visibility, predictive spend forecasting, and performance insights for their workloads.

AI-driven cloud cost optimization for BigQuery

At the core of Unravel Data’s data observability and FinOps platform is the AI-powered Insights Engine. It is purpose-built for data platforms—including BigQuery—to understand all the unique aspects and capabilities of each modern data stack and the underlying infrastructure to optimize efficiency and performance.

Unravel’s AI-powered Insights Engine continuously ingests and interprets millions of metadata inputs to provide real-time insights into application and system performance, along with recommendations to improve performance and efficiency for faster results and greater positive business impact for your existing cloud data spend.

Unravel provides insights and recommendations to optimize BigQuery reservations.

Using Unravel’s cost and performance optimization intelligence based on its deep observability at the job, user, and code level, users get recommendations such as:

Reservation sizing that achieves optimal cost efficiency and performance
SQL insights and anti-patterns to avoid
Scheduling insights for recurring jobs
Quota insights with respect to workload patterns
and more

With Unravel, BigQuery customers can speed cloud transformation initiatives by having predictive cost and performance insights of existing workloads prior to moving them to the cloud.

Visualization dashboards and unit costs

Visualizing unit costs not only simplifies cost management but also enhances decision-making processes within your organization. With clear insights into spending patterns and resource utilization, you can make informed choices regarding optimization strategies or budget allocation.

With Unravel, BigQuery customers can customize dashboards and alerts with easy-to-use widgets to enable at-a-glance and drill down dashboards on:

Spend
Performance
Unit economics

Unravel displays user count and cost trends by compute pricing tier.

From a unit economics perspective, BigQuery customers can build dashboards to show unit costs in terms of average cost per user, per project, and per job.

Take advantage of visualization dashboards in Unravel for BigQuery to effortlessly gain valuable insights into unit costs.

Additional features included in this release

Unravel 4.8.1 includes additional features, such as showback/chargeback reports, SQL insights and anti-patterns. You can compare two jobs side-by-side, enabling you to point out any metrics that are different between two runs, even if the queries are different.

With this release, you also get:

Top-K projects, users, and jobs
Showback by compute and storage types, services, pricing plans, etc.
Chargeback by projects and users
Out-of-the-box and custom alerts and dashboards
Project/Job views of insights and details
Side-by-side job comparisons
Data KPIs, metrics, and insights such as size and number of tables and partitions, access by jobs, hot/warm/cold tables

Use case scenarios

Unravel for BigQuery provides a single source of truth to improve collaboration across functional teams and accelerates workflows for common use cases. Below are just a few examples of how Unravel helps BigQuery users for specific situations:

Role	Scenario	Unravel benefits
FinOps Practitioner	Understand what we pay for BigQuery down to the user/app level in real time, accurately forecast future spend with confidence	Granular visibility at the project, job, and user level enables FinOps practitioners to perform cost allocation, estimate annual cloud data application costs, cost drivers, break-even, and ROI analysis
FinOps Practitioner / Engineering / Operations	Identify the most impactful recommendations to optimize overall cost and performance	AI-powered performance and cost optimization recommendations enable FinOps and data teams to rapidly upskill team members, implement cost efficiency SLAs, and optimize BigQuery pricing tier usage to maximize the company’s cloud data ROI
Engineering Lead / Product Owner	Identify the most impactful recommendations to optimize the cost and performance of a project	AI-driven insights and recommendations enable product and data teams to improve slot utilization, boost SQL query performance, leverage table partitioning and column clustering to achieve cost efficiency SLAs and launch more data jobs within the same project budget
Engineering / Operations	Live monitoring with alerts	Live monitoring with alerts speed MTTR and prevent outages before they happen
Data Engineer	Debugging a job and comparing jobs	Automatic troubleshooting guides data teams directly to the pinpoint the source of job failures down to the line of code or SQL query along with AI recommendations to fix it and prevent future issues
Data Engineer	Identify expensive, inefficient, or failed jobs	Proactively improve cost efficiency, performance, and reliability before deploying jobs into production. Compare two jobs side-by-side to find any metrics that are different between the two runs, even if the queries are different.

Get Started with Unravel for BigQuery

Learn more about Unravel for BigQuery by reviewing the docs and creating your own free account.

The post Announcing Unravel 4.8.1: Maximize business value with Google Cloud BigQuery Editions pricing appeared first on Unravel.

Unravel for Google BigQuery Datasheet

Unravel Data — Wed, 06 Jul 2022 20:29:39 +0000

AI-DRIVEN DATA OBSERVABILITY + FINOPS FOR GOOGLE BIGQUERY

Performance. Reliability. Cost-effectiveness.

Unravel’s automated, AI-powered data observability + FinOps platform for Google Cloud BigQuery and other modern data stacks provides 360° visibility to allocate costs with granular precision, accurately predict spend, run 50% more workloads at the same budget, launch new apps 3X faster, and reliably hit greater than 99% of SLAs.

Unravel Data Observability + FinOps for BigQuery you can:

Launch new apps 3X faster: End-to-end observability of data-native applications and pipelines. Automatic improvement of performance, cost efficiency, and reliability.
Run 50% more workloads for same budget: Break down spend and forecast accurately. Optimize apps and platforms by eliminating inefficiencies. Set guardrails and automate governance. Unravel’s AI helps you implement observability and FinOps to ensure you achieve efficiency goals.
Reduce firefighting time by 99% using AI-enabled troubleshooting: Detect anomalies, drift, skew, missing and incomplete data end-to-end. Integrate with multiple data quality solutions. All in one place.
Forecast budget with ⨦ 10% accuracy: Accurately anticipate cloud data spending to for more predictable ROI. Unravel helps you accurately forecast spending with granular cost allocation. Purpose-built AI, at job, user and workgroup levels, enables real-time visibility of ongoing usage.

To see Unravel Data for BigQuery in action contact: Data experts | 650 741-3442

The post Unravel for Google BigQuery Datasheet appeared first on Unravel.