CI/CD Archives - Unravel

Unravel Data Partners with Databricks for Lakehouse Observability and FinOps

Stephen Lamont — Tue, 05 Dec 2023 14:07:24 +0000

Purpose-built AI provides real-time cost and performance insights and efficiency recommendations for Databricks users

Palo Alto, CA — December 5, 2023 — Unravel Data, the first AI-enabled data observability and FinOps platform built to address the speed and scale of modern data platforms, today announced that it has joined the Databricks Partner Program to deliver AI-powered data observability into Databricks for granular visibility, performance optimizations, and cost governance of data pipelines and applications. With this new partnership, Unravel and Databricks will collaborate on Go-To-Market (GTM) efforts to enable Databricks customers to leverage Unravel’s purpose-built AI for the Lakehouse for real-time, continuous insights and recommendations to speed time to value of data and AI products and ensure optimal ROI.

With organizations increasingly under pressure to deliver data and AI innovation at lightning speed, data teams are on the front line of delivering production-ready data pipelines at an exponential rate while optimizing performance and efficiency to deliver faster time to value. Unravel’s purpose-built AI for Databricks integrates with Lakehouse Monitoring and Lakehouse Observability to deliver performance and efficiency needed to achieve speed and scale for data analytics and AI products. Unravel’s integration with Unity Catalog enables Databricks users to speed up lakehouse transformation by providing real-time, AI-powered cost insights, code-level optimizations, accurate spending predictions, and performance recommendations to accelerate data pipelines and applications for greater returns on cloud data platform investments. AutoActions and alerts help automate governance with proactive guardrails.

“Most organizations today are receiving unprecedented amounts of data from a staggering number of sources, and they’re struggling to manage it all, which can quickly lead to unpredictable cloud data spend. This combination of rapid lakehouse adoption and the hyperfocus companies have on leveraging AI/ML models for additional revenue and competitive advantage, brings the importance of data observability to the forefront,” said Kunal Agarwal, CEO and co-founder, Unravel Data. “Lakehouse customers who use Unravel can now achieve the agility required for AI/ML innovation while having the predictability and cost governance guardrails needed to ensure a strong ROI.”

Unravel’s purpose-built AI for Databricks delivers insights based on Unravel’s deep observability at the job, user, and code level to supply AI-driven cost efficiency recommendations, including compute provisioning, query performance, autoscaling efficiencies, and more.

Unravel for Databricks enables organizations to:

Speed cloud transformation initiatives by having real-time cost visibility, predictive spend forecasting, and performance insights for their workloads
Enhance time to market of new AI initiatives by mitigating potential pipeline bottlenecks and associated costs before they occur
Better manage and optimize the ROI of data projects with customized dashboards and alerts that offer insights on spend, performance, and unit economics

Unravel’s integration with popular DevOps tools like GitHub and Azure DevOps provides actionability in CI/CD workflows by enabling early issue detection during the code-merge phase and providing developers real-time insights into potential financial impacts of their code changes. This results in fewer production issues and improved cost efficiency.

Learn how Unravel and Databricks can help enterprises optimize their cloud data spend and increase ROI here.

About Unravel Data

Unravel Data radically transforms the way businesses understand and optimize the performance and cost of their modern data applications – and the complex data pipelines that power those applications. Unravel’s market-leading data observability and FinOps platform with purpose-built AI for each data platform, provides actionable recommendations needed for cost and performance data and AI pipeline efficiencies. A recent winner of the Best Data Tool & Platform of 2023 as part of the annual SIIA CODiE Awards, some of the world’s most recognized brands like Adobe, Maersk, Mastercard, Equifax, and Deutsche Bank rely on Unravel Data to unlock data-driven insights and deliver new innovations to market. To learn more, visit https://www.unraveldata.com.

The post Unravel Data Partners with Databricks for Lakehouse Observability and FinOps appeared first on Unravel.

Unravel CI/CD Integration for Databricks

Stephen Lamont — Wed, 18 Oct 2023 04:21:00 +0000

“Someone’s sitting in the shade today because someone planted a tree a long time ago.” —Warren Buffet

CI/CD, a software development strategy, combines the methodologies of Continuous Integration and Continuous Delivery/Continuous Deployment to safely and reliably deliver new versions of code in iterative short cycles. This practice bridges the gap between developers and operations team by streamlining the building, testing, and deployment of the code by automating the series of steps involved in this otherwise complex process. Traditionally used to speed up the software development life cycle, today CI/CD is gaining popularity among data scientists and data engineers since it enables cross-team collaboration and rapid, secure integration and deployment of libraries, scripts, notebooks, and other ML workflow assets.

One recent report found that 80% of organizations have adopted agile practices, but for nearly two-thirds of developers it takes at least one week to get committed code successfully running in production. Implementing CI/CD can streamline data pipeline development and deployment, accelerating release times and frequency, while improving code quality.

The evolving need for CI/CD for data teams

AI’s rapid adoption is driving the demand for fresh and reliable data for training, validation, verification, and drift analysis. Implementing CI/CD enhances your Databricks development process, streamlines pipeline deployment, and accelerates time-to-market. CI/CD revolutionizes how you build, test, and deploy code within your Databricks environment, helping you automate tasks, ensure a smooth transition from development to production, and enable lakehouse data engineering and data science teams to work more efficiently. And when it comes to cloud data platforms like Databricks, performance equals cost. The more optimized your pipelines are, the more optimized your Databricks spend will be.

Why incorporate Unravel into your existing DevOps workflow?

Unravel Data is the AI-powered data observability and FinOps platform for Databricks. By using Unravel’s CI/CD integration for Databricks, developers can catch performance problems early in development and deployment life cycles and proactively take actions to mitigate issues. This has shown to significantly reduce the time taken by data teams to act on critical timely insights. Unravel’s AI-powered efficiency recommendations, now embedded right into the DevOps environments, help foster a cost-conscious culture that compels developers to follow performance and cost-driven coding best practices. It also raises awareness of resource usage, configuration changes, and data layout issues that could impact service level agreements (SLAs) when the code is deployed in production. Accepting or ignoring insights suggested by Unravel helps promote accountability for developers’ actions and creates transparency for the DevOps and FinOps practitioners to attribute cost-saving wins and losses.

With the advent of Generative Pre-Trained Transformer (GPT) AI models, data teams today have started using coding co-pilots to generate accurate and efficient code. With Unravel, this experience is a notch better with real-time visibility into code inefficiencies that can translate into production performance problems like bottlenecks, performance anomalies, missed SLAs, cost overruns, etc. Other code-assist tools like GitHub Copilot are limited in their scope of assistance to static code analysis based code rewrite suggestions, Unravel’s AI-driven Insights Engine built for Databricks considers the performance and cost impact of code and configuration changes and provides recommendations to make optimal suggestions. This helps you streamline your development process, identify bottlenecks, and ensure optimal performance throughout the life cycle of your data pipelines.

Unravel’s AI-powered analysis automatically provides deep, actionable insights.

Next, let’s look into what key benefits are provided by the Unravel integration into your DevOps workflows.

Achieve operational excellence

Unravel’s CI/CD integration for Databricks enhances data team and developer efficiency by seamlessly providing real-time, AI-powered insights to help optimize performance and troubleshoot issues in your data pipelines.

Unravel integrates with your favorite CI/CD tools such as Azure DevOps and GitHub. When developers make changes to code and submit via a pull request, Unravel automatically conducts AI-powered checks to ensure the code is performant and efficient. This helps developers:

Maximize resource utilization by gaining valuable insights into pipeline efficiency
Achieve performance and cost goals by analyzing critical metrics during development
Leverage specific, actionable recommendations to improve code for cost and performance optimization
Identify and resolve bottlenecks promptly, reducing development time

Leverage developer pull request (PR) reviews

Developers play a crucial role in achieving cost efficiency through PR reviews. Encourage them to adopt best practices and follow established guidelines when submitting code for review. This ensures that all tests are run and results are thoroughly evaluated before merging into the main project branch.

By actively involving developers in the review process, you tap into their knowledge and experience to identify potential areas for cost savings within your pipelines. Their insights can help streamline workflows, improve resource allocation, and eliminate inefficiencies. Involving developers in PR reviews fosters collaboration among team members and encourages feedback, creating a culture of continuous improvement.

Here are several ways developer PR reviews can enhance the reliability of data pipelines:

Ensure code quality: Developer PR reviews serve as an effective mechanism to maintain high code-quality standards. Through these reviews, developers can catch coding errors, identify potential bugs, and suggest improvements before the code is merged into the production repository.
Detect issues early: By involving developers in PR reviews, you ensure that potential issues are identified early in the development process. This allows for prompt resolution and prevents problems from propagating further down the pipeline.
Mitigate risks: Faulty or inefficient code changes can have significant impacts on your pipelines and overall system stability. With developer PR reviews, you involve experts who understand the intricacies of the pipeline and can help mitigate risks by providing valuable insights and suggestions.
Foster a collaborative environment: Developer PR reviews create a collaborative environment where team members actively engage with one another’s work. Feedback provided during these reviews promotes knowledge sharing, improves individual skills, and enhances overall team performance.

Real-world examples of CI/CD integration for Databricks

Companies in finance, healthcare, e-commerce, and more have successfully implemented CI/CD practices with Databricks. Enterprise organization across industries leverage Unravel to ensure that code is performant and efficient before it goes into production.

Financial services: A Fortune Global 500 bank provides Unravel to their developers as a way to evaluate their pipelines before they do a code release.
Healthcare: One of the largest health insurance providers in the United States uses Unravel to ensure that its business-critical data applications are optimized for performance, reliability, and cost in its development environment—before they go live in production.
Logistics: One of the world’s largest logistics companies leverages Unravel to upskill their data teams at scale. They put Unravel in their CI/CD process to ensure that all code and queries are reviewed to ensure they meet the desired quality and efficiency bar before they go into production.

Self-guided tours of Unravel AI-powered health checks
Check it out!

Unravel CI/CD integration for Databricks use cases

Incorporating Unravel’s real-time, AI insights into PR reviews helps developers ensure the reliability, performance, and cost efficiency of data pipelines before they go into production. This practice ensures that any code changes are thoroughly reviewed before being merged into the main project branch. By catching potential issues early on, you can prevent pipeline breaks, bottlenecks, and wasted compute tasks from running in production.

Ensure pipeline reliability

Unravel’s purpose-built AI helps augment your PR reviews to ensure code quality and reliability in your release pipelines. Unravel integration into your Databricks CI/CD process helps developers identify potential issues early on and mitigate risks associated with faulty or inefficient code changes. Catching breaking changes in development and test environments helps developers improve productivity and helps ensure that you achieve your SLAs.

1-minute tour: Unravel’s AI-powered Speed, Cost, Reliability Optimizer

Achieve cost efficiency

Unravel provides immediate feedback and recommendations to improve cost efficiency. This enables you to catch inefficient code, and developers can make any necessary adjustments for optimal resource utilization before it impacts production environments. Using Unravel as part of PR reviews helps your organization optimize resource allocation and reduce cloud waste.

1-minute tour: Unravel’s AI-powered Databricks Cost Optimization

Boost pipeline performance

Collaborative code reviews provide an opportunity to identify bottlenecks, optimize code, and enhance data processing efficiency. By including Unravel’s AI recommendations in the review process, developers benefit from AI-powered insights to ensure code changes achieve performance objectives.

1-minute tour: Unravel’s AI-powered Pipeline Bottleneck Analysis

Get started with Unravel CI/CD integration for Databricks

Supercharge your CI/CD process for Databricks using Unravel’s AI. By leveraging this powerful combination, you can significantly improve developer productivity, ensure pipeline reliability, achieve cost efficiency, and boost overall pipeline performance. Whether you choose to automate PR reviews with Azure DevOps or GitHub, Unravel’s CI/CD integration for Databricks has got you covered.

Now it’s time to take action and unleash the full potential of your Databricks environment. Integrate Unravel’s CI/CD solution into your workflow and experience the benefits firsthand. Don’t miss out on the opportunity to streamline your development process, save costs, and deliver high-quality code faster than ever before.

Next steps to learn more

Read Unravel’s CI/CD integration documentation

Watch this video

Book a live demo

The post Unravel CI/CD Integration for Databricks appeared first on Unravel.

Rev Up Your Lakehouse: Lap the Field with a Databricks Operating Model

Stephen Lamont — Thu, 12 Oct 2023 18:19:05 +0000

In this fast-paced era of artificial intelligence (AI), the need for data is multiplying. The demand for faster data life cycles has skyrocketed, thanks to AI’s insatiable appetite for knowledge. According to a recent McKinsey survey, 75% expect generative AI (GenAI) to “cause significant or disruptive change in the nature of their industry’s competition in the next three years.”

Next-gen AI craves unstructured, streaming, industry-specific data. Although the pace of innovation is relentless, “when it comes to generative AI, data really is your moat.”

But here’s the twist: efficiency is now the new cool kid in town. Data product profitability hinges on optimizing every step of the data life cycle—from ingestion and transformation, to processing, curating, and refining. It’s no longer just about gathering mountains of information; it’s about collecting the right data efficiently.

As new, industry-specific GenAI use cases emerge, there is an urgent need for large data sets for training, validation, verification, and drift analysis. GenAI requires flexible, scalable, and efficient data architecture, infrastructure, code, and operating models to achieve success.

Leverage a Scalable Operating Model to Accelerate Your Data Life Cycle Velocity

To optimize your data life cycle, it’s crucial to leverage a scalable operating model that can accelerate the velocity of your data processes. By following a systematic approach and implementing efficient strategies, you can effectively manage your data from start to finish.

Databricks recently introduced a scalable operating model for data and AI to help customers achieve a positive Return on Data Assets (RODA).

Databricks’ iterative end-to-end operating pipeline

Define Use Cases and Business Requirements

Before diving into the data life cycle, it’s essential to clearly define your use cases and business requirements. This involves understanding what specific problems or goals you plan to address with your data. By identifying these use cases and related business requirements, you can determine the necessary steps and actions needed throughout the entire process.

Build, Test, and Iterate the Solution

Once you have defined your use cases and business requirements, it’s time to build, test, and iterate the solution. This involves developing the necessary infrastructure, tools, and processes required for managing your data effectively. It’s important to continuously test and iterate on your solution to ensure that it meets your desired outcomes.

During this phase, consider using agile methodologies that allow for quick iterations and feedback loops. This will enable you to make adjustments as needed based on real-world usage and feedback from stakeholders.

Scale Efficiently

As your data needs grow over time, it’s crucial to scale efficiently. This means ensuring that your architecture can handle increased volumes of data without sacrificing performance or reliability.

Consider leveraging cloud-based technologies that offer scalability on-demand. Cloud platforms provide flexible resources that can be easily scaled up or down based on your needs. Employing automation techniques such as machine learning algorithms or artificial intelligence can help streamline processes and improve efficiency.

By scaling efficiently, you can accommodate growing datasets while maintaining high-quality standards throughout the entire data life cycle.

Elements of the Business Use Cases and Requirements Phase

In the data life cycle, the business requirements phase plays a crucial role in setting the foundation for successful data management. This phase involves several key elements that contribute to defining a solution and ensuring measurable outcomes. Let’s take a closer look at these elements:

Leverage design thinking to define a solution for each problem statement: Design thinking is an approach that focuses on understanding user needs, challenging assumptions, and exploring innovative solutions. In this phase, it is essential to apply design thinking principles to identify and define a single problem statement that aligns with business objectives.
Validate the business case and define measurable outcomes: Before proceeding further, it is crucial to validate the business case for the proposed solution. This involves assessing its feasibility, potential benefits, and alignment with strategic goals. Defining clear and measurable outcomes helps in evaluating project success.
Map out the MVP end user experiences: To ensure user satisfaction and engagement, mapping out Minimum Viable Product (MVP) end-user experiences is essential. This involves identifying key touchpoints and interactions throughout the data life cycle stages. By considering user perspectives early on, organizations can create intuitive and effective solutions.
Understand the data requirements: A thorough understanding of data requirements is vital for successful implementation. It includes identifying what types of data are needed, their sources, formats, quality standards, security considerations, and any specific regulations or compliance requirements.
Gather required capabilities with platform architects: Collaborating with platform architects helps gather insights into available capabilities within existing infrastructure or technology platforms. This step ensures compatibility between business requirements and technical capabilities while minimizing redundancies or unnecessary investments.
Establish data management roles, responsibilities, and procedures: Defining clear roles and responsibilities within the organization’s data management team is critical for effective execution. Establishing procedures for data observability, stewardship practices, access controls, privacy policies ensures consistency in managing data throughout its life cycle.

By following these elements in the business requirements phase, organizations can lay a solid foundation for successful data management and optimize the overall data life cycle. It sets the stage for subsequent phases, including data acquisition, storage, processing, analysis, and utilization.

Build, Test, and Iterate the Solution

To successfully implement a data life cycle, it is crucial to focus on building, testing, and iterating the solution. This phase involves several key steps that ensure the development and deployment of a robust and efficient system.

Plan development and deployment: The first step in this phase is to carefully plan the development and deployment process. This includes identifying the goals and objectives of the project, defining timelines and milestones, and allocating resources effectively. By having a clear plan in place, the data team can streamline their efforts towards achieving desired outcomes.
Gather end-user feedback at every stage: Throughout the development process, it is essential to gather feedback from end users at every stage. This allows for iterative improvements based on real-world usage scenarios. By actively involving end users in providing feedback, the data team can identify areas for enhancement or potential issues that need to be addressed.
Define CI/CD pipelines for fast testing and iteration: Implementing Continuous Integration (CI) and Continuous Deployment (CD) pipelines enables fast testing and iteration of the solution. These pipelines automate various stages of software development such as code integration, testing, deployment, and monitoring. By automating these processes, any changes or updates can be quickly tested and deployed without manual intervention.
Data preparation, cleaning, and processing: Before training machine learning models or conducting experiments with datasets, it is crucial to prepare, clean, and process the data appropriately. This involves tasks such as removing outliers or missing values from datasets to ensure accurate results during model training.
Feature engineering: Feature engineering plays a vital role in enhancing model performance by selecting relevant features from raw data or creating new features based on domain knowledge. It involves transforming raw data into meaningful representations that capture essential patterns or characteristics.
Training and ML experiments: In this stage of the data life cycle, machine learning models are trained using appropriate algorithms on prepared datasets. Multiple experiments may be conducted, testing different algorithms or hyperparameters to find the best-performing model.
Model deployment: Once a satisfactory model is obtained, it needs to be deployed in a production environment. This involves integrating the model into existing systems or creating new APIs for real-time predictions.
Model monitoring and scoring: After deployment, continuous monitoring of the model’s performance is essential. Tracking key metrics and scoring the model’s outputs against ground truth data helps identify any degradation in performance or potential issues that require attention.

By following these steps and iterating on the solution based on user feedback, data teams can ensure an efficient and effective data life cycle from development to deployment and beyond.

Efficiently Scale and Drive Adoption with Your Operating Model

To efficiently scale your data life cycle and drive adoption, you need to focus on several key areas. Let’s dive into each of them:

Deploy into production: Once you have built and tested your solution, it’s time to deploy it into production. This step involves moving your solution from a development environment to a live environment where end users can access and utilize it.
Evaluate production results: After deploying your solution, it is crucial to evaluate its performance in the production environment. Monitor key metrics and gather feedback from users to identify any issues or areas for improvement.
Socialize data observability and FinOps best practices: To ensure the success of your operating model, it is essential to socialize data observability and FinOps best practices among your team. This involves promoting transparency, accountability, and efficiency in managing data operations.
Acknowledge engineers who “shift left” performance and efficiency: Recognize and reward engineers who prioritize performance and efficiency early in the development process. Encourage a culture of proactive optimization by acknowledging those who contribute to improving the overall effectiveness of the data life cycle.
Manage access, incidents, support, and feature requests: Efficiently scaling your operating model requires effective management of access permissions, incident handling processes, support systems, and feature requests. Streamline these processes to ensure smooth operations while accommodating user needs.
Track progress towards business outcomes by measuring and sharing KPIs: Measuring key performance indicators (KPIs) is vital for tracking progress towards business outcomes. Regularly measure relevant metrics related to adoption rates, user satisfaction levels, cost savings achieved through efficiency improvements, etc., then share this information across teams for increased visibility.

By implementing these strategies within your operating model, you can efficiently scale your data life cycle while driving adoption among users. Remember that continuous evaluation and improvement are critical for optimizing performance throughout the life cycle.

Unravel for Databricks now free!
Create free account

Drive for Performance with Purpose-Built AI

Unravel helps with many elements of the Databricks operating model:

Quickly identify failed and inefficient Databricks jobs: One of the key challenges is identifying failed and inefficient Databricks jobs. However, with AI purpose-built for Databricks, this task becomes much easier. By leveraging advanced analytics and monitoring capabilities, you can quickly pinpoint any issues in your job executions.
Creating ML models vs deploying them into production: Creating machine learning models is undoubtedly challenging, but deploying them into production is even harder. It requires careful consideration of factors like scalability, performance, and reliability. With purpose-built AI tools, you can streamline the deployment process by automating various tasks such as model versioning, containerization, and orchestration.
Leverage Unravel’s Analysis tab for insights: To gain deeper insights into your application’s performance during job execution, leverage the analysis tab provided by purpose-built AI solutions. This feature allows you to examine critical details like memory usage errors or other bottlenecks that may be impacting job efficiency.

Unravel’s AI-powered analysis automatically provides deep, actionable insights.

Share links for collaboration: Collaboration plays a crucial role in data management and infrastructure optimization. Unravel enables you to share links with data scientists, developers, and data engineers to provide detailed information about specific test runs or failed Databricks jobs. This promotes collaboration and facilitates a better understanding of why certain jobs may have failed.
Cloud data cost management made easy: Cloud cost management, also known as FinOps, is another essential aspect of data life cycle management. Purpose-built AI solutions simplify this process by providing comprehensive insights into cost drivers within your Databricks environment. You can identify the biggest cost drivers such as users, clusters, jobs, and code segments that contribute significantly to cloud costs.
AI recommendations for optimization: To optimize your data infrastructure further, purpose-built AI platforms offer valuable recommendations across various aspects, including infrastructure configuration, parallelism settings, handling data skewness issues, optimizing Python/SQL/Scala/Java code snippets, and more. These AI-driven recommendations help you make informed decisions to enhance performance and efficiency.

Learn More & Next Steps

Unravel hosted a virtual roundtable, Accelerate the Data Analytics Life Cycle, with a panel of Unravel and Databricks experts. Unravel VP Clinton Ford moderated the discussion with Sanjeev Mohan, principal at SanjMo and former VP at Gartner, Subramanian Iyer, Unravel training and enablement leader and Databricks SME, and Don Hilborn, Unravel Field CTO and former Databricks lead strategic solutions architect.

FAQs

How can I implement a scalable operating model for my data life cycle?

To implement a scalable operating model for your data life cycle, start by clearly defining roles and responsibilities within your organization. Establish efficient processes and workflows that enable seamless collaboration between different teams involved in managing the data life cycle. Leverage automation tools and technologies to streamline repetitive tasks and ensure consistency in data management practices.

What are some key considerations during the Business Requirements Phase?

During the Business Requirements Phase, it is crucial to engage stakeholders from various departments to gather comprehensive requirements. Clearly define project objectives, deliverables, timelines, and success criteria. Conduct thorough analysis of existing systems and processes to identify gaps or areas for improvement.

How can I drive adoption of my data life cycle operational model?

To drive adoption of your data management solution, focus on effective change management strategies. Communicate the benefits of the solution to all stakeholders involved and provide training programs or resources to help them understand its value. Encourage feedback from users throughout the implementation process and incorporate their suggestions to enhance usability and address any concerns.

What role does AI play in optimizing the data life cycle?

AI can play a significant role in optimizing the data life cycle by automating repetitive tasks, improving data quality through advanced analytics and machine learning algorithms, and providing valuable insights for decision-making. AI-powered tools can help identify patterns, trends, and anomalies in large datasets, enabling organizations to make data-driven decisions with greater accuracy and efficiency.

How do I ensure performance while implementing purpose-built AI?

To ensure performance while implementing purpose-built AI, it is essential to have a well-defined strategy. Start by clearly defining the problem you want to solve with AI and set measurable goals for success. Invest in high-quality training data to train your AI models effectively. Continuously monitor and evaluate the performance of your AI system, making necessary adjustments as needed.

The post Rev Up Your Lakehouse: Lap the Field with a Databricks Operating Model appeared first on Unravel.

Logistics giant optimizes cloud data costs up front at speed & scale

Stephen Lamont — Tue, 20 Jun 2023 16:17:40 +0000

One of the world’s largest logistics companies leverages automation and AI to empower every individual data engineer with self-service capability to optimize their jobs for performance and cost. The company was able to cut its cloud data costs by 70% in six months—and keep them down with automated 360° cost visibility, prescriptive guidance, and guardrails for its 3,000 data engineers across the globe. The company pegs the ROI of Unravel at 20X: “for every $1 we invested, we save 20.”

Key Results

20X ROI from Unravel
cut costs by 70% in 6 months
75% time savings via automation
proactive guardrails to keep costs within budgets
automated AI health checks in CI/CD prevent inefficiencies in production

Holding individuals accountable for cloud usage/cost

Like many organizations moving their data workloads to the cloud, the company soon found that its cloud data costs were very rapidly rising to unacceptable levels. Data analytics are core to the business, but the cost of its cloud data workloads was simply getting too unpredictable and expensive. Cloud data expenses had to be brought under control.

The company chose Unravel to enable a shift-left approach where data engineers become more aware and individually accountable for their cloud usage/spending, and are given the means to make better, more cost-effective decisions when incurring expenses.

Data is core to the business

The company is increasingly doing more things with more data for more reasons. Says its Head of Data Platform Optimization, “Data is pervasive in logistics. Data is literally at the center of pretty much everything [we do]. Picking up goods to transport them, following the journeys of those goods, making all the details of those journeys available to customers. Our E Class ships can take 18,000 shipping containers on one journey from, say, China to Europe. One journey on one of those ships moves more goods than was moved in the entire 19th century between continents. One journey. And we’ve got six of them going back and forth all the time.”

But the company also uses data to drive innovation in integrated logistics, supply chain resiliency, and corporate social responsibility. “[We’re] a company that doesn’t just use data to figure out how to make money, we use data to better the company, make us more profitable, and at the same time put back into the planet.

“The data has risen exponentially, and we’re just starting to come to grips with what we can do with it. For example, in tandem with a couple of nature organizations, we worked out that if a ship hits a whale at 12 knots and above, that whale will largely die. Below 12 knots, it will live. We used the data about where the whales were to slow the ships down.”

Getting visibility into cloud data costs

The single biggest obstacle to controlling cloud costs for any data-forward organization is having only hazy visibility into cloud usage. The company saw its escalating cloud data platform costs as an efficiency issue—how efficiently the company’s 3,000 “relatively young and inexperienced” data engineers were running their jobs.

Says the company’s Head of Data Platform Optimization, “We’ve been moving into the cloud over the past 3-4 years. Everybody knows that [the] cloud isn’t free. There’s not a lot of altruism there from the cloud providers. So that’s the biggest issue we faced. We spent 12 months deploying a leading cloud data platform, and at the end of 12 months, the platform was working fine but the costs were escalating.

“The problem with that is, if you don’t have visibility on those costs, you can’t cut those costs. And everybody—no matter what your financial situation—wants to cut costs and keep them down. We had to attain [cost] visibility. Unravel gives us the visibility, the insight, to solve that problem.”

“The [cloud data] platform was working fine but the costs were escalating. If you don’t have visibility on those costs, you can’t cut those costs.”

Get costs right in Dev, before going into production

The logistics company emphasizes that you have to get it right for cost and performance up front, in development. “Don’t ever end up with a cost problem. That’s part of the [shifting] mindset. Get in there early to deal with cost. Go live with fully costed jobs. Don’t go live and then work out what the job cost is and figure out how to cut it. [Determine] what it’s going to cost in Dev/Test, what it’s going to cost in Prod, then check it as soon as it goes live. If the delta’s right, game on.”

As the company’s data platform optimization leader points out, “Anybody can spin up a cloud environment.” Quite often their code and resource configurations are not optimized. Individual engineers may be requesting oversized resources (size, number, type) than what they actually need to run their jobs successfully, or they have code issues that are leading to inefficient performance—and jobs costing more than they need to.

“The way to deal with this [escalating cost] problem is to push it left. Don’t have somebody charging in from Finance waving a giant bill saying, ‘You’re costing a fortune.’ Let’s keep Finance out of the picture. And crucial to this is: Do it up front. Do it in your Dev environment. Don’t go into production, get a giant bill, and only then try to figure out how to cut that.”

Unravel AI automatically identifies inefficient code, oversized resources, data partitioning problems, and other issues that lead to higher-than-necessary cloud data costs.

“One of the big problems with optimizing jobs is the sheer scale of what we’re talking about. We have anywhere between 5,000-7,500 data pipelines. You’re not just looking for a needle in a haystack . . . first of all, you have to find the haystack. Then you have to learn how to dig into it. That’s an awful lot of code for human beings to look at, something that machines are perfectly suited to. And Unravel is the best implementation we’ve seen of its kind.”

The Unravel platform harnesses full-stack visibility, contextual awareness, AI-powered actionable intelligence, and automation to go “beyond observability”—to not only show you what’s going on and why, but guide you with crisp, prescriptive recommendations on exactly how to make things better and then keep them that way proactively. (See the Unravel platform overview page for more detail.)

“We put Unravel right in the front of our development environment. So nothing goes into production unless we know it’s going to work at the right cost/price. We make sure problems never reach production. We cut them off at the pass, so to speak. Because otherwise, you’ve just invented the world’s best mechanism for closing the stable door after the cost horse has bolted.”

Empower self-service via immediate feedback loops

The company used to outsource a huge amount of its data workloads but is now moving to become an open source–first, built-in-house company. A key part of the company’s strategy is to enable strong engineering practices, design tenets (of which cost is one), and culture. For data platform optimization, that means empowering every data engineer with the insights, guidance, and guardrails to optimize their code so that workloads run highly efficiently and cost is not an afterthought.

“We’ve got approximately 3,000 people churning out Spark code. In a ‘normal environment,’ you can ask the people sitting next to you how they’d do something. We’ve had thousands of engineers working from home for the past two years. So how do you harvest that group knowledge and how do people learn?

“We put Unravel in to look at and analyze every single line of code written, and come up with those micro-suggestions—and indeed macro-suggestions—that you’d miss. We’ve been through everything like code walk-throughs, code dives, all those things that are standard practice. But if you have a couple of thousand engineers writing, say, 10 lines of code a day, you’ll never be able to walk through all that code.”

That’s where Unravel’s high degree of automation and AI really help. Unravel auto-discovers and captures metadata from every platform, system, and application across the company’s data stack, correlates it all into a meaningful workload-aware context, and automatically analyzes everything to pinpoint inefficiencies and offer up AI-powered recommendations to guide engineers on how to optimize their jobs.

“We put Unravel right in the front of our development environment to look at and analyze every single line of code written and come up with suggestions [to improve efficiency].”

“Data engineers hate fixing live problems. Because it’s boring! And they want to be doing the exciting stuff, keep developing, innovating. So if we can stop those problems at Dev time, make sure they deploy optimal code, it’s a win-win. They never have to fix that production code, and honestly we don’t have to ask them to fix it.”

The company leverages Unravel’s automated AI analysis to up-level its thousands of developers and engineers worldwide. Optimizing today’s complex data applications/pipelines—for performance, reliability, and cost—requires a deeper level of data engineering.

“Because Unravel takes data from lots of other organizations, we’re harvesting the benefits of hundreds of thousands of coders and data engineers globally. We’re gaining the insights we couldn’t possibly get by being even the best at self-analysis.

“The key for me is to be able to go back to an individual data engineer and say, ‘Did you realize that if you did your code this way, you’d be 10 times more efficient?’ And it’s about giving them feedback that allows them to learn themselves. What I love about Unravel is that you get the feedback, but it’s not like they’re getting pulled into an office and having ‘a talk’ about those lines of code. You go into your private workspace, [Unravel] gives you the suggestions, you deal with the suggestions, you learn, you move on and don’t make the mistakes again. And they might not even be mistakes; they might just be things you didn’t know about. What we’re finding with Unravel is that it’s sometimes the nuances that pop up that give you the benefits. It’s pivotal to how we’re going to get the benefits, long term, out of what we’re doing.”

Efficiency improvements cut cloud data costs by 70%

The company saw almost immediate business value from Unravel’s automated AI-powered analysis and recommendations. “We were up and running within 48 hours. Superb professional services from Unravel, and a really willing team of people from our side. It’s a good mix.

The company needed to get cloud data costs under control—fast. More and more mission-critical data workloads were being developed on a near-constant cadence, and these massive jobs were becoming increasingly expensive. Unravel enabled the company to get ahead of its cloud data costs at speed and scale, saving millions.

“We started in the summer, and by the time Christmas came around, we had cut in excess of 70% of our costs. I’d put the ROI of Unravel at about 20X: every $1 we invested, we save $20.”

The company has been able to put into individual developers’ and engineers’ hands a tool to make smarter, data-driven decisions about how they incur cloud data expenses.

“What I say to new data engineers is that we will empower them to create the best systems in the world, but only you can empower yourself to make them the most efficient systems in the world. Getting data engineers to actually use Unravel was not a difficult task. We’re very lucky: people on our team are highly motivated to do the right thing—by the company, by themselves. If doing the right thing becomes the default option, people will follow that path.

“Unravel makes it easy to do the right thing.”

The post Logistics giant optimizes cloud data costs up front at speed & scale appeared first on Unravel.

The Evolution from DevOps to DataOps

Christine Della Penna — Wed, 22 Feb 2023 16:11:51 +0000

By Jason Bloomberg, President, Intellyx
Part 2 of the Demystifying Data Observability Series for Unravel Data

In part one of this series, fellow Intellyx analyst Jason English explained the differences between DevOps and DataOps, drilling down into the importance of DataOps observability.

The question he left open for this article: how did we get here? How did DevOps evolve to what it is today, and what parallels or differences can we find in the growth of DataOps?

DevOps Precursors

The traditional, pre-cloud approach to building custom software in large organizations separated the application development (‘dev’) teams from the IT operations (‘ops’) personnel responsible for running software in the corporate production environment.

In between these two teams, organizations would implement a plethora of processes and gates to ensure the quality of the code and that it would work properly in production before handing it to the ops folks to deploy and manage.

Such ‘throw it over the wall’ processes were slow and laborious, leading to deployment cycles many months long. The importance of having software that worked properly, so the reasoning went, was sufficient reason for such onerous delays.

Then came the Web. And the cloud. And enterprise digital transformation initiatives. All of these macro-trends forced enterprises to rethink their plodding software lifecycles.

Not only were they too slow to deliver increasingly important software capabilities, but business requirements would evolve far too quickly for the deployed software to keep up.

Such pressures led to the rise of agile software methodologies, cloud native computing, and DevOps.

Finding the Essence of DevOps

The original vision of DevOps was to pull together the dev and ops teams to foster greater collaboration, in hopes that software deployment cadences would accelerate while maintaining or improving the quality of the resulting software.

Over time, this ‘kumbaya’ vision of seamless collaboration itself evolved. Today, we can distill the essence of modern DevOps into these five elements:

A cultural and organizational shift away from the ‘throw it over the wall’ mentality to greater collaboration across the software lifecycle
A well-integrated, comprehensive automation suite that supports CI/CD activities, along with specialists who manage and configure such technologies, i.e., DevOps engineers
A proactive, shift-left mentality that seeks to represent production behavior declaratively early in the lifecycle for better quality control and rapid deployment
Full-lifecycle observability that shifts problem resolution to the left via better prediction of problematic behavior and preemptive mitigation of issues in production
Lean practices to deliver value and improve efficiency throughout the software development lifecycle

Furthermore, DevOps doesn’t live in a vacuum. Rather, it is consistent with and supports other modern software best practices, including infrastructure-as-code, GitOps, and the ‘cattle not pets’ approach to supporting the production environment via metadata representations that drive deployment.

The Evolution of DataOps

Before information technology (IT), organizations had management of information systems (MIS). And before MIS, at the dawn of corporate computing, enterprises implemented data processing (DP).

The mainframes at the heart of enterprise technology as far back as the 1960s were all about processing data – crunching numbers in batch jobs that yielded arcane business results, typically dot-matrix printed on green and white striped paper.

Today, IT covers a vast landscape of technology infrastructure, applications, and hybrid on-premises and cloud environments – but data processing remains at the heart of what IT is all about.

Early in the evolution of DP, it became clear that the technologies necessary for processing transactions were different from the technology the organization required to provide business intelligence to line-of-business (LoB) professionals.

Enterprises required parallel investments in online transaction processing (OLTP) and online analytical processing (OLAP), respectively. OLAP proved the tougher nut to crack, because enterprises generated voluminous quantities of transactional data, while LoB executives required complex insights that would vary over time – thus taxing the ability of the data infrastructure to respond to the business need for information.

To address this need, data warehouses exploded onto the scene, separating the work of OLAP into two phases: transforming and loading data into the warehouses and supporting business intelligence via queries of the data in them.

Operating these early OLAP systems was relatively straightforward, centering on administering the data warehouses. In contrast, today’s data estate – the sum total of all the data infrastructure in a modern enterprise – is far more varied than in the early data warehousing days.

Motivations for DataOps

Operating this data estate has also become increasingly complex, as the practice of DataOps rises in today’s organizations.

Complexity, however, is only one motivation for DataOps. There are more reasons why today’s data estate requires it:

Increased mission-criticality of data, as digital transformations rework organizations into digital enterprises
Increased importance of real-time data, a capability that data warehouses never delivered
Greater diversity of data-centric use cases beyond basic business intelligence
Increased need for dynamic applications of data, as different LoBs need an ever-growing variety of data-centric solutions
Growing need for operational cost predictability, optimization, and governance

Driving these motivations is the rise of AI, as it drives the shift from code-based to data-based software behavior. In other words, AI is more than just another data-centric use case. It repositions data as the central driver of software functionality for the enterprise.

The Intellyx Take

For all these reasons, DataOps can no longer follow the simplistic data warehouse administration pattern of the past. Today’s data estate is dynamic, diverse, and increasingly important, requiring organizations to take a full-lifecycle approach to collecting, transforming, storing, querying, managing, and consuming data.

As a result, DataOps requires the application of core DevOps practices along the data lifecycle. DataOps requires the cross-lifecycle collaboration, full-lifecycle automation and observability, and the shift-left mentality that DevOps brings to the table – only now applied to the enterprise data estate.

Thinking of DataOps as ‘DevOps for data’ may be too simplistic an explanation of the role DataOps should play. Instead, it might be more accurate to say that as data increasingly becomes the driver of software behavior, DataOps becomes the new DevOps.

Next up in part 3 of this series: DataFinOps: More on the menu than data cost governance

The post The Evolution from DevOps to DataOps appeared first on Unravel.

Unravel’s Hybrid Test Strategy to Conquer the Scaling Data Giant

Floyd Smith — Wed, 25 Nov 2020 16:34:01 +0000

Unravel provides full-stack coverage and a unified, end-to-end view of everything going on in your environment, plus recommendations from our rules-based model and our AI engine. Unravel works on-premises, in the cloud, and for cloud migration.

Unravel provides direct support for platforms such as Cloudera Hadoop (CDH), HortonWorks Data Platform (HDP), Cloudera Data Platform (CDP), and a wide range of cloud solutions, including AWS infrastructure as a service (IaaS), Amazon EMR, Microsoft Azure IaaS, Azure HDInsight, and DataBricks on both cloud platforms, as well as GCP IaaS, Dataproc, and BigQuery. We have grown to supporting scores of well-known customers and engaging in productive partnerships with both AWS and Microsoft Azure.

We have an ambitious engineering agenda and a relatively large team, with more than half the company in the engineering org. We want our engineering process to be as forward-looking as the product we deliver.

We constantly strive to develop adaptive and end-to-end testing strategies. For testing, Unravel had started with a modest customer deployment. We now support scores of large customer deployments with 2000 nodes and 18 clusters. We had to conquer the giant challenges posed by this massive increase in scale.

Since testing is an integral part of every release cycle, we give top priority to developing a systematic, automated, scalable, and yet customizable approach for driving the entire release cycle. As a new startup comes up, the obvious and quickest approach one is tempted to follow is the traditional testing model, and to manually test and certify a module/product.

Well, this structure sometimes works satisfactorily when the features in the product are few. However, a growing customer base, increasing features, and the need for supporting multiple platforms give rise to proportionally more and more testing. At this stage, testing becomes a time-consuming and cumbersome process. So if you and your organization are struggling with the traditional, manual testing approach for modern data stack pipelines, and looking for a better solution, then read on.

In this blog, we will walk you through our journey about:

How we evolved our robust testing strategies and methodologies.
The measures we took and the best practices that we applied to make our test infrastructure the best fit for our increasing scale and growing customer base.

Take the Unravel tour

Try Unravel for free

Evolution of Unravel’s Test Model

Like any other startup, Unravel had a test infrastructure that followed the traditional testing approach of manual testing, as depicted in the following image:

Initially, with just a few customers, Unravel mainly focused on release certification through manual testing. Different platforms and configurations were manually tested, which took roughly ~4-6 weeks of release cycle. With increasing scale, this cycle became endless, which made the release train longer and unpredictable.

This type of testing model has quite a few stumbling blocks and does not work well with scaling data sizes and product features. Common problems with the traditional approach include:

Late discovery of defects, leading to:
- - Last-minute code changes and bug fixes
  - Frantic communication and hurried testing
  - Paving the way for newer regressions
Deteriorating testing quality due to:
- - Manual end-to-end testing of the modern data stack pipeline, which becomes error-prone and tends to miss out on corner cases, concurrency issues, etc.
  - Difficulty in capturing the lag issues in modern data stack pipelines
Longer and unpredictable release trains that leads to:
- - Stretched deadlines, since testing time increases proportionally with the number of builds across multiple platforms.
  - Increased cost due to high resource requirements such as more man-hours, heavily equipped test environments, etc.

Spotting the defects at a later stage becomes a risky affair, since the cost of fixing defects increases exponentially across the software development life cycle (SDLC).

While the traditional testing model has its cons, it also has some pros. A couple of key advantages are that:

Manual testing can reproduce customer-specific scenarios
It can also catch some good bugs where you least expect them to be

So we resisted the temptation to move fully to what most organizations now implement, a completely mechanized approach. To cope with the challenges faced in the traditional model, we introduced a new test model, a hybrid approach that has, for our purposes, the best of both worlds.

This model is inspired by the following strategy which is adaptive, to scale with a robust testing framework.

Our Strategy

Unravel’s hybrid test strategy is the foundation for our new test model.

New Testing Model

Our current test model is depicted in the following image:

This approach mainly focuses on end-to-end automation testing, which provides the following benefits:

Runs automated daily regression suite on every new release build, with end-to-end tests for all the components in the Unravel stack
Provides a holistic view of the regression results using a rich reporting dashboard
The Automation framework works for all kinds of releases (point release, GA release), making it flexible, robust, and scalable.

A reporting dashboard and an automated regression summary email are key differentiators of the new test model.

The new test model provides a lot of key advantages. However, there are some disadvantages too.

KPI Comparison – Traditional Vs New Model

The following bar chart is derived on the KPI values for deployment and execution time, which is captured for both the traditional as well as the new model.

The following graph showcases the comparison of deployment, execution, and resource time savings:

Release Certification Layout

The new testing model comes with a new Release Certification layout, as shown in the image below. The process involved in the certification of a release cycle is summarized in the Release Cycle Summary table.

Release Cycle Summary

Conclusion

Today, Unravel has a rich set of unit tests; more than 1000 tests are run for every commit, along with the CI/CD pipeline in place. This includes functional sanity test cases (1500+) and can cover our end-to-end data pipelines as well as the integration test cases. Such a testing strategy can significantly reduce the impact on integrated functionality by proactively highlighting issues in pre-checks.

Cutting a long story short, It is indeed a difficult and tricky task to build a flexible, robust, and scalable test infrastructure that caters to varying scales, especially for a cutting-edge product like Unravel, and with a team that strives for the highest quality in every build.

In this post, we have highlighted commonly faced hurdles in testing modern data stack pipelines. We have also showcased the reliable testing strategies we have developed to efficiently test and certify modern data stack ecosystems. Armed with these test approaches, just like us, you can also effectively tame the scaling giant!

Reference Links (clip art images)

Unravel’s Hybrid Test Strategy:

End to End testing: https://www.lambdatest.com/blog/all-you-need-to-know-about-end-to-end-testing/
Integrated testing: https://professionalqa.com/types-of-integration-testing
Load testing: https://www.kiwiqa.com/getting-to-know-the-fundamentals-of-performance-testing-a-guide-for-amateurs-and-professionals/
Weekly E2E pipeline: https://www.pngkit.com/view/u2r5a9y3q8y3r5y3_smart-end-to-end-process-icon/
Parallelism: https://pages.tacc.utexas.edu/~eijkhout/istc/html/parallel.html (2.6.1.1 The Fork-Join Mechanism)

Exponential Cost of Fixing Defects:

Developer (Male computer user vector): image:https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS1E42d7aLu5TRDcxzmpFD2N5b3YGo4E3C2dt1fcGONICav8jTteF-mpOM&s
Developer (Female computer user vector image): https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTX-Q6L_n9Bd5zdQ4J3haaO9hkMJKX3Cpg7RY9Ml8bq56YOmXqo2i5rrA&s
Build: https://icon-library.com/icon/automation-icon-2.html

Unravel’s Test Model:

QA Testing – Manual (Computer Users): https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSqI4svILCrc77R2-K0q5jys4PCXDiJFnEVASBiF5A_FNdnSBbdA_kD9Q&s

The post Unravel’s Hybrid Test Strategy to Conquer the Scaling Data Giant appeared first on Unravel.