Customers Archives - Unravel

Empowering Data Agility: Equifax’s Journey to Operational Excellence

Unravel Data — Tue, 25 Jun 2024 16:14:24 +0000

In the data-driven world where real-time decision-making and innovation are not just goals but necessities, global data analytics and technology companies like Equifax must navigate a complex environment to achieve success. Equifax sets the standard for operational excellence by enabling real-time decision-making, accelerating innovation, scaling efficiently, consistently achieving service level agreements (SLAs), and building reliable data pipelines. Let’s delve into the strategies that lead to such a resounding transformation, with a spotlight on the pivotal role of data observability and FinOps.

When Speed Meets Precision: Revolutionizing Credit Scoring

Kumar Menon, the Equifax CTO for Data Mesh and Decision-Making Technologies, faced a formidable challenge: enabling sub-second credit scoring. And he is not alone. “Many organizations find it challenging to turn their data into insights quickly enough to generate real and timely results” (Deloitte). With Unravel’s data observability and FinOps platform, Kumar’s team overcame this hurdle to deliver faster insights and decisions while fortifying Equifax’s competitive edge.

Unifying Data with Vision and Strategy

The necessity to integrate data across six cloud regions cannot be overstated for a company operating at Equifax’s scale. Gartner discovered that many organizations struggle with “high-cost and low-value data integration cycles.” Kumar Menon and his team, leveraging the tools and methodologies of data observability and FinOps, streamlined this intricate process. The result? Faster and more economical products capable of satisfying the needs of a constantly shifting market.

Mastering the Data Deluge

Managing over 10 petabytes of data is no small feat. IDC noted enterprises’ need for “responsiveness, scalability, and resiliency of the digital infrastructure.” Equifax, with Kumar Menon’s foresight, embraced the power of Unravel’s data observability and FinOps frameworks to adapt and grow. By efficiently managing their cloud resource usage, the team was able to scale data processing with the appropriate cloud resources.

Delivering Real-Time Decisions

The Equifax team needed to support 12 million daily inquiries. Operating production services at this scale can be overwhelming for any system not prepared for such a deluge. In fact, Gartner uncovered a significant challenge around the ability to “detect errors and lower the cost to fix and shorten the resulting downtime.” Data observability and FinOps serve as Kumar Menon’s frameworks to not only confront these challenges but to ensure that Equifax can consistently deliver accurate, real-time decisions such as credit scores and employment verifications.

Streamlining Massive Data Ingestion

The Equifax data team’s colossal task of ingesting over 15 billion observations per day could potentially entangle any team in a web of complexity. Again, Gartner articulates the frustration many organizations face with “complex, convoluted data pipelines that require toilsome remediation to manage errors.” Unravel’s platform provided Kumar’s team the means to build and maintain reliable and robust data pipelines, assuring the integrity of Equifax’s reliable and responsive data products.

The Path Forward with Unravel

In a data-centric industry, Equifax exemplifies leadership through precision, agility, and efficiency. The company’s journey demonstrates their capacity to enable real-time decision-making, accelerate innovation, and ensure operational scale and reliability.

At Unravel, we understand and empathize with data teams facing substantial challenges like the ones described above. As your ally in data performance, productivity, and cost management, we’re committed to equipping you with tools that not only remove obstacles but also enhance your operational prowess. Harness the power of data observability and FinOps with Unravel Data through a self-guided tour that shows how you can:

Deliver faster insights and decisions with Unravel’s pipeline bottleneck analysis.
Bring faster and more efficient products to market by using Unravel’s speed, cost, and reliability optimizer.
Scale data processing efficiently with Unravel’s correlated data observability.
Achieve and exceed your SLAs using Unravel’s out-of-the-box reporting.
Build performant data pipelines with Unravel’s pipeline bottleneck analysis.

Ready to unlock the full potential of your data strategies and operations? See how you can achieve more with data observability and FinOps with Unravel Data in a self-guided tour.

The post Empowering Data Agility: Equifax’s Journey to Operational Excellence appeared first on Unravel.

Healthcare leader uses AI insights to boost data pipeline efficiency

Stephen Lamont — Mon, 10 Jul 2023 19:16:15 +0000

One of the largest health insurance providers in the United States uses Unravel to ensure that its business-critical data applications are optimized for performance, reliability, and cost in its development environment—before they go live in production.

Data and data-driven statistical analysis have always been at the core of health insurance. But over the past few years the industry has seen an explosion in the volume, velocity, and variety of big data—electronic health records (EHR), electronic medical records (EMRs), and IoT data produced wearable medical devices and mobile health apps. As the company’s chief medical officer has said, “Sometimes I think we’re becoming more of a data analytics company than anything else.”

Like many Fortune 500 organizations, the company has a complex, hybrid, multi-everything data estate. Many workloads are still running on premises in Cloudera, but the company also has pipelines on Azure and Google Cloud Platform. Further, its Dev environment is fully on AWS. Says the key technology manager for the Enterprise Data Analytics Platform team, “Unravel is needed for us to ensure that the jobs run smoothly because these are critical data jobs,” and Unravel helps them understand and optimize performance and resource usage.

With the data team’s highest priority being able to guarantee that its 1,000s of data jobs deliver reliable results on time, every time, they find Unravel’s automated AI-powered Insights Engine invaluable. Unravel auto-discovers everything the company has running in its data estate (both in Dev and Prod), extracting millions of contextualized granular details from logs, traces, metrics, events and other metadata—horizontally and vertically—from the application down to infrastructure and everything in between. Then Unravel’s AI/ML correlates all this information into a holistic view that “connects the dots” as to how everything works together. AI and machine learning algorithms analyze millions of details in context to detect anomalous behavior in real time, pinpoint root causes in milliseconds, and automatically provide prescriptive recommendations on where and how to change configurations, containers, code, resource allocations, etc.

Application developers rely on Unravel to automatically analyze and validate their data jobs in Dev, before the apps ever go live in Prod by, first, identifying inefficient code—code that is most likely to break in production—and then, second, pinpointing to every individual data engineer exactly where and why code should be fixed, so they can tackle potential problems themselves via self-service optimization. The end-result ensures that performance inefficiencies never see the light of day.

The Unravel AI-powered Insights Engine similarly analyzes resource usage. The company leverages the chargeback report capability to understand how the various teams are using their resources. (But Unravel can also slice and dice the information to show how resources are being used by individual users, individual jobs, data products or projects, departments, Dev vs. Prod environments, budgets, etc.) For data workloads still running in Cloudera, this helps avoid resource contention and lets teams queue their jobs more efficiently. Unravel even enables teams to kill a job instantly if it is causing another mission-critical job to fail.

For workloads running in the cloud, Unravel provides precise, prescriptive AI-fueled recommendations on more efficient resource usage—usually downsizing requested resources to fewer or less costly alternatives that will still hit SLAs.

As the company’s technology manager for cloud infrastructure and interoperability says, “Some teams use humongous data, and every year our users are growing.” With such astronomical growth, it has become ever more important to tackle data workload efficiency more proactively—everything has simply gotten too big and too complex and business-critical to reactively respond. The company has leveraged Unravel’s automated guardrails and governance rules to trigger alerts whenever jobs are using more resources than necessary.

Top benefits:

Teams can now view their queues at a glance and run their jobs more efficiently, without conflict.

Chargeback reports show how teams are using their resources (or not), which helps set the timing for running jobs—during business vs. off hours. This has provided a lot of relief for all application teams.

The post Healthcare leader uses AI insights to boost data pipeline efficiency appeared first on Unravel.

Logistics giant optimizes cloud data costs up front at speed & scale

Stephen Lamont — Tue, 20 Jun 2023 16:17:40 +0000

One of the world’s largest logistics companies leverages automation and AI to empower every individual data engineer with self-service capability to optimize their jobs for performance and cost. The company was able to cut its cloud data costs by 70% in six months—and keep them down with automated 360° cost visibility, prescriptive guidance, and guardrails for its 3,000 data engineers across the globe. The company pegs the ROI of Unravel at 20X: “for every $1 we invested, we save 20.”

Key Results

20X ROI from Unravel
cut costs by 70% in 6 months
75% time savings via automation
proactive guardrails to keep costs within budgets
automated AI health checks in CI/CD prevent inefficiencies in production

Holding individuals accountable for cloud usage/cost

Like many organizations moving their data workloads to the cloud, the company soon found that its cloud data costs were very rapidly rising to unacceptable levels. Data analytics are core to the business, but the cost of its cloud data workloads was simply getting too unpredictable and expensive. Cloud data expenses had to be brought under control.

The company chose Unravel to enable a shift-left approach where data engineers become more aware and individually accountable for their cloud usage/spending, and are given the means to make better, more cost-effective decisions when incurring expenses.

Data is core to the business

The company is increasingly doing more things with more data for more reasons. Says its Head of Data Platform Optimization, “Data is pervasive in logistics. Data is literally at the center of pretty much everything [we do]. Picking up goods to transport them, following the journeys of those goods, making all the details of those journeys available to customers. Our E Class ships can take 18,000 shipping containers on one journey from, say, China to Europe. One journey on one of those ships moves more goods than was moved in the entire 19th century between continents. One journey. And we’ve got six of them going back and forth all the time.”

But the company also uses data to drive innovation in integrated logistics, supply chain resiliency, and corporate social responsibility. “[We’re] a company that doesn’t just use data to figure out how to make money, we use data to better the company, make us more profitable, and at the same time put back into the planet.

“The data has risen exponentially, and we’re just starting to come to grips with what we can do with it. For example, in tandem with a couple of nature organizations, we worked out that if a ship hits a whale at 12 knots and above, that whale will largely die. Below 12 knots, it will live. We used the data about where the whales were to slow the ships down.”

Getting visibility into cloud data costs

The single biggest obstacle to controlling cloud costs for any data-forward organization is having only hazy visibility into cloud usage. The company saw its escalating cloud data platform costs as an efficiency issue—how efficiently the company’s 3,000 “relatively young and inexperienced” data engineers were running their jobs.

Says the company’s Head of Data Platform Optimization, “We’ve been moving into the cloud over the past 3-4 years. Everybody knows that [the] cloud isn’t free. There’s not a lot of altruism there from the cloud providers. So that’s the biggest issue we faced. We spent 12 months deploying a leading cloud data platform, and at the end of 12 months, the platform was working fine but the costs were escalating.

“The problem with that is, if you don’t have visibility on those costs, you can’t cut those costs. And everybody—no matter what your financial situation—wants to cut costs and keep them down. We had to attain [cost] visibility. Unravel gives us the visibility, the insight, to solve that problem.”

“The [cloud data] platform was working fine but the costs were escalating. If you don’t have visibility on those costs, you can’t cut those costs.”

Get costs right in Dev, before going into production

The logistics company emphasizes that you have to get it right for cost and performance up front, in development. “Don’t ever end up with a cost problem. That’s part of the [shifting] mindset. Get in there early to deal with cost. Go live with fully costed jobs. Don’t go live and then work out what the job cost is and figure out how to cut it. [Determine] what it’s going to cost in Dev/Test, what it’s going to cost in Prod, then check it as soon as it goes live. If the delta’s right, game on.”

As the company’s data platform optimization leader points out, “Anybody can spin up a cloud environment.” Quite often their code and resource configurations are not optimized. Individual engineers may be requesting oversized resources (size, number, type) than what they actually need to run their jobs successfully, or they have code issues that are leading to inefficient performance—and jobs costing more than they need to.

“The way to deal with this [escalating cost] problem is to push it left. Don’t have somebody charging in from Finance waving a giant bill saying, ‘You’re costing a fortune.’ Let’s keep Finance out of the picture. And crucial to this is: Do it up front. Do it in your Dev environment. Don’t go into production, get a giant bill, and only then try to figure out how to cut that.”

Unravel AI automatically identifies inefficient code, oversized resources, data partitioning problems, and other issues that lead to higher-than-necessary cloud data costs.

“One of the big problems with optimizing jobs is the sheer scale of what we’re talking about. We have anywhere between 5,000-7,500 data pipelines. You’re not just looking for a needle in a haystack . . . first of all, you have to find the haystack. Then you have to learn how to dig into it. That’s an awful lot of code for human beings to look at, something that machines are perfectly suited to. And Unravel is the best implementation we’ve seen of its kind.”

The Unravel platform harnesses full-stack visibility, contextual awareness, AI-powered actionable intelligence, and automation to go “beyond observability”—to not only show you what’s going on and why, but guide you with crisp, prescriptive recommendations on exactly how to make things better and then keep them that way proactively. (See the Unravel platform overview page for more detail.)

“We put Unravel right in the front of our development environment. So nothing goes into production unless we know it’s going to work at the right cost/price. We make sure problems never reach production. We cut them off at the pass, so to speak. Because otherwise, you’ve just invented the world’s best mechanism for closing the stable door after the cost horse has bolted.”

Empower self-service via immediate feedback loops

The company used to outsource a huge amount of its data workloads but is now moving to become an open source–first, built-in-house company. A key part of the company’s strategy is to enable strong engineering practices, design tenets (of which cost is one), and culture. For data platform optimization, that means empowering every data engineer with the insights, guidance, and guardrails to optimize their code so that workloads run highly efficiently and cost is not an afterthought.

“We’ve got approximately 3,000 people churning out Spark code. In a ‘normal environment,’ you can ask the people sitting next to you how they’d do something. We’ve had thousands of engineers working from home for the past two years. So how do you harvest that group knowledge and how do people learn?

“We put Unravel in to look at and analyze every single line of code written, and come up with those micro-suggestions—and indeed macro-suggestions—that you’d miss. We’ve been through everything like code walk-throughs, code dives, all those things that are standard practice. But if you have a couple of thousand engineers writing, say, 10 lines of code a day, you’ll never be able to walk through all that code.”

That’s where Unravel’s high degree of automation and AI really help. Unravel auto-discovers and captures metadata from every platform, system, and application across the company’s data stack, correlates it all into a meaningful workload-aware context, and automatically analyzes everything to pinpoint inefficiencies and offer up AI-powered recommendations to guide engineers on how to optimize their jobs.

“We put Unravel right in the front of our development environment to look at and analyze every single line of code written and come up with suggestions [to improve efficiency].”

“Data engineers hate fixing live problems. Because it’s boring! And they want to be doing the exciting stuff, keep developing, innovating. So if we can stop those problems at Dev time, make sure they deploy optimal code, it’s a win-win. They never have to fix that production code, and honestly we don’t have to ask them to fix it.”

The company leverages Unravel’s automated AI analysis to up-level its thousands of developers and engineers worldwide. Optimizing today’s complex data applications/pipelines—for performance, reliability, and cost—requires a deeper level of data engineering.

“Because Unravel takes data from lots of other organizations, we’re harvesting the benefits of hundreds of thousands of coders and data engineers globally. We’re gaining the insights we couldn’t possibly get by being even the best at self-analysis.

“The key for me is to be able to go back to an individual data engineer and say, ‘Did you realize that if you did your code this way, you’d be 10 times more efficient?’ And it’s about giving them feedback that allows them to learn themselves. What I love about Unravel is that you get the feedback, but it’s not like they’re getting pulled into an office and having ‘a talk’ about those lines of code. You go into your private workspace, [Unravel] gives you the suggestions, you deal with the suggestions, you learn, you move on and don’t make the mistakes again. And they might not even be mistakes; they might just be things you didn’t know about. What we’re finding with Unravel is that it’s sometimes the nuances that pop up that give you the benefits. It’s pivotal to how we’re going to get the benefits, long term, out of what we’re doing.”

Efficiency improvements cut cloud data costs by 70%

The company saw almost immediate business value from Unravel’s automated AI-powered analysis and recommendations. “We were up and running within 48 hours. Superb professional services from Unravel, and a really willing team of people from our side. It’s a good mix.

The company needed to get cloud data costs under control—fast. More and more mission-critical data workloads were being developed on a near-constant cadence, and these massive jobs were becoming increasingly expensive. Unravel enabled the company to get ahead of its cloud data costs at speed and scale, saving millions.

“We started in the summer, and by the time Christmas came around, we had cut in excess of 70% of our costs. I’d put the ROI of Unravel at about 20X: every $1 we invested, we save $20.”

The company has been able to put into individual developers’ and engineers’ hands a tool to make smarter, data-driven decisions about how they incur cloud data expenses.

“What I say to new data engineers is that we will empower them to create the best systems in the world, but only you can empower yourself to make them the most efficient systems in the world. Getting data engineers to actually use Unravel was not a difficult task. We’re very lucky: people on our team are highly motivated to do the right thing—by the company, by themselves. If doing the right thing becomes the default option, people will follow that path.

“Unravel makes it easy to do the right thing.”

The post Logistics giant optimizes cloud data costs up front at speed & scale appeared first on Unravel.

Equifax Optimizes GCP Costs at Scale

Stephen Lamont — Mon, 12 Jun 2023 16:49:28 +0000

The post Equifax Optimizes GCP Costs at Scale appeared first on Unravel.

Managing FinOps at Equifax

Stephen Lamont — Mon, 12 Jun 2023 16:49:11 +0000

The post Managing FinOps at Equifax appeared first on Unravel.

DBS Empowers Self-Service Engineering with Unravel

Stephen Lamont — Thu, 25 May 2023 22:30:28 +0000

The post DBS Empowers Self-Service Engineering with Unravel appeared first on Unravel.

DBS Discusses Data+FinOps for Banking

Stephen Lamont — Thu, 25 May 2023 22:30:14 +0000

The post DBS Discusses Data+FinOps for Banking appeared first on Unravel.

Enabling Strong Engineering Practices at Maersk

Christine Della Penna — Thu, 09 Feb 2023 17:32:45 +0000

As DataOps moves along the maturity curve, many organizations are deciphering how to best balance the success of running critical jobs with optimized time and cost governance.

Watch the fireside chat from Data Teams Summit where Mark Sear, Head of Data Platform Optimization for Maersk, shares how his team is driving towards enabling strong engineering practices, design tenets, and culture at one of the largest shipping and logistics companies in the world. Transcript below.

Transcript

Kunal Agarwal:

Very excited to have a fireside chat here with Mark Sear. Mark, you’re the director of data integration, AI, machine learning, and analytics at Maersk. And Maersk is one of the largest shipping line and logistics companies in the world. Based out of Copenhagen, but with subsidiaries and offices across 130 countries with about 83,000 employees worldwide. We know that we always think about logistics and shipping as something just working harmoniously, transparently in the background, but in the recent past, given all of the supply chain pressures that have happened with the pandemic and beyond, and even that ship getting stuck in the Suez Canal, I think a lot more people are paying attention to this industry as well. So I was super excited to have you here, Mark, to hear more about yourself, you as the leader of data teams, and about what Maersk is doing with data analytics. Thank you so much for joining us.

Mark Sear:

It’s an absolute pleasure. You’ve just illustrated the perils of Wikipedia. Maersk is not just one of the largest shipping companies in the world, but we’re also actually one of the largest logistics companies in the world. We have our own airline. We’ve got hundreds of warehouses globally. We’re expanding massively, so we are there and of course we are a leader in decarbonation. We’ve got a pledge to be carbon-neutral way before just about anybody else. So it’s a fantastic company to work at. Often I say to my kids, we don’t just deliver stuff, we are doing something to help the planet. It’s a bigger mission than just delivering things, so it’s a pleasure to be here.

Kunal Agarwal:

That’s great. Mark, before we get into Maersk, we’d love to learn about you. So you have an amazing background and accumulation of all of these different experiences. Would you help the audience to understand some of your interests and how you got to be in the role that you currently are at? And what does your role comprise inside of Maersk?

Mark Sear:

Wow. It’s a long story. I’m an old guy, so I’m just couple of years over 60 now, which you could say you don’t look it, but don’t worry about it.

Kunal Agarwal:

You don’t look it at all, only 40.

Mark Sear:

I’m a generation that didn’t, not many of us went to university, so let me start there. So I left school at 18, did a bit of time in the basic military before going to what you would call, I suppose fundamentally, a crypto analyst school. They would detect how smart you were, whether you had a particular thing for patents, and they sent me there. Did that, and then since then I’ve worked in banking, in trading in particular. I ran a big trading group for a major bank, which was great fun, so we were using data all the time to look for both, not just arbitrage, but other things. Fundamentally, my life has been about data.

Kunal Agarwal:

Right.

Mark Sear:

Even as a kid, my dad had a very small business and because he didn’t know anything about computers, I would do the computing for him and work out the miles per gallon that his trucks were getting and what the trade-in was.

Kunal Agarwal:

Sure.

Mark Sear:

And things like that. So data’s been part of my life and I love everything about data and what it can do for people, companies, everything. Yeah, that’s it. Data.

Kunal Agarwal:

That’s great, Mark. Obviously this is a conference spot, a data team, so it’s great to hear from the data guy who’s been doing it for a really long time. So, Mark, to begin, Maersk, as you said, is one of the largest shipping and logistics companies in the world. How has data transformed your company?

Mark Sear:

One thing, this is a great question. How has it transformed and how will it transform?

Kunal Agarwal:

Yes.

Mark Sear:

I think that for the first time in the last couple of years, and I’ve been very lucky, I’ve only been with the company three years, but shortly after I joined, we had a new tech leader, a gentleman called Navneet Kapoor. The guy is a visionary. If you imagine shipping was seen for many years, there’s a bit of a backwater really. You move containers from one country to another country on ships, that was it. Navneet has changed the game for us all and made people realize that data is pervasive in logistics. It’s literally everywhere. If you think about our biggest ship, ship class, for example, it’s called an E-Class. That can take over 18,000 shipping containers on one journey from China to Europe, 18,000.

Kunal Agarwal:

Oh wow.

Mark Sear:

Think about that. So that’s absolutely huge. Now, to put that into context, in one journey, one of those ships will move more goods than was moved in the entire 19th century between continents, one journey. And we got six of them and they’re going backwards and forwards all the time. So the data has risen exponentially and what you can do with it, we are now just starting to get to grips with it, that’s what so exciting. Consider, we have companies that do want to know how much carbon is being produced as part of their products. We have things like that. We just have an incredibly diverse set of products.

To give you an example, I worked on a project about 18 months ago where we worked out, working in tandem with a couple of nature organizations, that if a ship hits a whale at 12 knots and above, that whale will largely die. If you hit it below 12 knots, it will live. It’s a bit like hitting an adult at 30 miles an hour versus 20. The company puts some money in so we could use the data for where the whales were to slow the ships down. So this is an example of where this company doesn’t just think about what can we do to make money. This is a company that thinks about how can we use data to better the company, make us more profitable, and at the same time, put back into the planet that gave us the ability to have this business.

Kunal Agarwal:

Let’s not forget that we’re human, most importantly.

Mark Sear:

Yeah, it’s super exciting, right? You can make all the money in the world. If you trash the planet, there’s not a lot left to enjoy as part of it. And I love that about this company.

Kunal Agarwal:

Absolutely. And I’m guessing with the pandemic and post-pandemic, and all of the other data sets that you guys are gathering anyways from sensors or from the shipping lines or from all the efficiencies, with all the proliferation of all this data inside your organization, what challenges has your team faced or does the Maersk data team face?

Mark Sear:

Well, my team is in the enterprise architecture team. We therefore deal with all the other teams that are dealing with data, and I think we got the same challenges as everybody. We’ve got the data quality right? Do we know where that data comes from? Are we processing it efficiently? Do we have the right ideas to work on the right insights to get value out of that data? I think they’re common industry things, and as with everything, it’s a learning process. So one man’s high-quality data is another woman’s low-quality data.

And depending on who you are and what you want to do with that data, people have to understand how that quality affects other people downstream. And of course, because you’re quite right, we did have a pandemic, and in the pandemic shipping rates went a little bit nuts and they’re normalizing now. But, of course, if you think about introducing predictive algorithms where the price is going vertically and the algorithm may not know that there’s a pandemic on, it just sees price. So I think what we find is challenging, same as everybody else, is how do you put that human edge around data? Very challenging. How do you build really high-performing teams? How do you get teams to truly work together and develop that esprit de corps? So there are a lot of human problems that go alongside the data problems.

Kunal Agarwal:

Yeah. Mark, give us a sense of your size. In terms of teams, applications, whatever would help us understand what you guys were, where you guys are, and where you guys headed.

Mark Sear:

Three years ago when I joined there were 1,900 people in tech; we’ve now got nearly 6,000. We had a huge amount of outsourcing; now we’re insourcing, we’re moving to an open source first event-based company. We’ve been very inquisitive. We’ve bought some logistics companies, so we’ve gone on the end-to-end journey now with the logistics integrator of choice globally. We’ve got our own airline. So you have to think about a lot of things that play together.

My team is a relatively tiny team. We’ve got about 12, but we liaise with, for example, our global data and analytics team that has got 600 people in it. We then organized into platforms, which are vertically problem solving, but fully horizontally integrated passing events between them. And each one of those has their own data team in it as well. So overall, I would guess we’ve got 3,000 people working directly with data in IT and then of course many thousands more.

Kunal Agarwal:

Wow.

Mark Sear:

Out in the organization. So it’s big organizations. Super exciting. Should say, now I’m going to get a quick commercial in. If you are watching this and you are a top data talent, please do hit me up with your resume.

Kunal Agarwal:

There’s a couple of thousand people watching this live, so you’ll definitely.

Mark Sear:

Hey, there you go, man. So listen, as long as they’re quality, I don’t care.

Kunal Agarwal:

From Mark, he’s a great boss as well. So when you think about the maturity curve of data operations, where do you think Maersk is at and what stands in your way to be fully matured?

Mark Sear:

Okay, so let’s analyze that. I think the biggest problem in any maturity curve is not defining the curve. It’s not producing a pyramid to say we are here and a dial to say, well, you rank as a one, you want to rank as a five.

Kunal Agarwal:

Sure.

Mark Sear:

The biggest problem to me is the people that actually formulate that curve. Now everyone’s got staff turnover and everyone or the majority of people know that they’re part of a team. But the question is how do you get that team to work with other teams and how do you disseminate that knowledge and get that group think of what is best practice for DataOps? What is best practice for dealing with these problems?

Kunal Agarwal:

It’s almost a spectrum on the talent side, isn’t it?

Mark Sear:

It’s a spectrum on the talent side, there’s a high turnover because certainly in the last 12 to 18 months, salaries have been going crazy, so you’ve had crazy turnover rates in some areas, not so much in other areas. So the human side of this is one part of the problem, and it’s not just the human side to how do you keep them engaged, it’s how do you share that knowledge and how do you get that exponential learning organization going?

And perhaps when we get into how we’ve arrived at tools like Unravel, I’ll explain to you what my theory is on that, but it’s almost a swarm learning that you need here, an ants style learning of how to solve problems. And that’s the hardest thing, is getting everybody in that boat swimming in the same direction before you can apply best practices because everybody says this is best practice. Sure, but if it was as simple as looking at a Gartner or whoever thing and saying, “Oh, there are the five lines we need to do,” everybody would do it. There’d be no need for anybody to innovate because we could do it; human beings aren’t very good at following rules, right?

Kunal Agarwal:

Yeah. So what kind of shifts and changes did you have to make in your big data operations and tools that you had to put into place for getting that maturity to where you expected it to be?

Mark Sear:

I think the first thing we’ve got to do, we’ve got to get people thinking slightly shorter timeframe. So everybody talks about Agile, Agile, Agile.

Kunal Agarwal:

Right.

Mark Sear:

Agile means different things to different people. We had some people who thought that Agile was, “Well, you’re going to get a fresh data set at the end of the day, so what the heck are you complaining about? When I started 15 years ago, you got it weekly.” That’s not agile. Equally, you’ve got people who say, I need real-time data. Well, do you really need real-time data if you’re actually dealing with an expense account? You probably don’t.

Kunal Agarwal:

Right.

Mark Sear:

Okay, so the first thing we’ve got to do is level set expectations of our users and then we’ve got to dovetail what we can deliver into those. You’ve got to be business focused, you’ve got to bring value. And that’s a journey. It’s a journey for the business users.

Kunal Agarwal:

Sure.

Mark Sear:

It’s a journey for our users. It’s about learning. So that’s what we’re doing. It’s taking time. Yeah, it’s taking time, but it’s like a snowball. It is rolling and it is getting bigger and it’s getting better, getting faster.

Kunal Agarwal:

And then when you think about the tools, Mark, are there any that you have to put into place to accelerate this?

Mark Sear:

I mean, we’ve probably got one of everything to start and now we’re shrinking. If I take . . . am I allowed to talk about Unravel?

Kunal Agarwal:

Sure.

Mark Sear:

So I’ll talk about–

Kunal Agarwal:

As much as you would.

Mark Sear:

–Unravel for a few seconds. So if you think about what we’ve got, let’s say we’ve got 3,000 people, primarily relatively young, inexperienced people churning out Spark code, let’s say Spark Databricks code, and they all sit writing it. And of course if you are in a normal environment, you can ask the person next to you, how would you do this? You ask the person over there, how would you do this? We’ve had 3,000 engineers working from home for two years, even now, they don’t want to come into the office per se, because it’s inconvenient, number one, because you might be journeying in an hour in and an hour home, and also it’s not actually, truly is productive. So the question is how do you harvest that group knowledge and how do people learn?

So for us, we put Unravel in to look at and analyze every single line of code we write and come up with those micro suggestions and indeed macro suggestions that you would miss. And believe me, we’ve been through everything like code walkthroughs, code dives, all those things. They’re all standard practice. If you’ve got 2,000 people and they write, let’s say, 10 lines of code a day each, 20,000 lines of code, you are never going to walk through all of that code. You are never going to be able to level set expectations. And this is key to me, be able to go back to an individual data engineer and say, “Hey, dude, listen, about these couple of lines of code. Did you realize if you did it like this, you could be 10 times as efficient?” And it’s about giving that feedback in a way that allows them to learn themselves.

And that’s what I love about Unravel: you can get the feedback, but it’s not like when you get that feedback, nobody says, “Come into my office, let’s have a chat about these lines of code.” You go into your private workspace, it gives you the suggestions, you deal with the suggestions, you learn, you move on, you don’t make the mistakes again. And they may not even be mistakes. They might just be things you didn’t know about.

Kunal Agarwal:

Right.

Mark Sear:

And so because Unravel takes data from lots of other organizations as well, as I see it, we’re in effect, harvesting the benefits of hundreds of thousands of coders globally, of data engineers globally. And we are gaining the insights that we couldn’t possibly gain by being even the best self-analysis on the planet, you couldn’t do it without that. And that to me is the advantage of it. It’s like that swarm mentality. If you’ve ever, anybody watching this, had a look at swarm AI, which is to predict, you can use it to predict events. It’s like if you take a soccer game, and I’ve worked in gambling, if you take a soccer match and you take a hundred people, I’ll call it soccer, even though the real name for is football, you Americans.

Kunal Agarwal:

It’s football, I agree too.

Mark Sear:

It’s football, so we’re going to call it football, association football to give you it’s full name. If you ask a hundred football fans to predict a score, you’ll get a curve, and you’ll generally, from that predictor, get a good result. Way more accurate than asking 10 so-called experts, such as with code. And that’s what we’re finding with Unravel is that sometimes it’s the little nuances that just pop up that are giving us more benefits.

Kunal Agarwal:

Right.

Mark Sear:

So it’s pivotal to how we are going to get benefits out over the longer term of what we’re doing.

Kunal Agarwal:

That’s great. And we always see a spectrum of skills inside an organization. So our mission is trying to level the playing field so anybody, even a business user, can log in without knowing the internals of all of these complex data technologies. So it’s great to hear the way Maersk is actually using it. We spoke a little bit about making these changes. We’d love to double click on some of these hurdles, right? Because you said it was a journey to get to people to this mature or fast-moving data operations, if you may, or more agile data operations if you may. If we can double click for a second, what has been the biggest hurdle? Is it the mindset? Is it managing the feedback loop? Is it changing the practices? Is it getting new types of people on board? What has been the biggest hurdle?

Mark Sear:

Tick all of the above.

Kunal Agarwal:

Okay.

Mark Sear:

But I think–

Kunal Agarwal:

Pick for option E.

Mark Sear:

Yeah, so let me give you an example. There are several I’ve had with people that have said to me, “I’ve been doing this 25 years. There’s nothing, I’ve been doing it 25 years.” That presupposes that 25 years of knowledge and experience is better than 10 minutes with a tool that’s got 100,000 years of learning.

Kunal Agarwal:

Right.

Mark Sear:

Over a 12-month period. So that I classify that as the ego problem. Sometimes people need their ego brushing, sometimes they need their ego crushing. It’s about looking the person in the eye, working out what’s the best strategy of dealing with them and saying to them, “Look, get on board.” This isn’t about saying you are garbage or anything else. This is about saying to you, learn and keep mentoring other people as you learn.

Kunal Agarwal:

Yeah.

Mark Sear:

I remember another person said to me, “Oh my god, I’ve seen what this tool can do. Is AI going to take my job?” And I said to them, no, AI isn’t going to take your job, but if you’re not careful, somebody, a human being that is using AI will take it, and that doesn’t apply to me. That applies just in general to the world. You cannot be a Luddite, you cannot fight progress. And as we’ve seen with Chat GPT and things like that recently, the power of the mass of having hundreds and thousands and millions of nodes analyzing stuff is precisely what will bring that. For example, my son who’s 23, smart kid, well, so he tells me. Smart kid, good uni, good university, blah blah blah. He said to me, “Oh Tesla, they make amazing cars.” And I said to him, Tesla isn’t even a car company. Tesla is a data company that happens to build a fairly average electric car.

Kunal Agarwal:

Absolutely.

Mark Sear:

That’s it. It’s all about data. And I keep saying to my data engineers, to be the best version of you at work and even outside work, keep picking up data about everything, about your life, about your girlfriend, the way she feels. About your boyfriend, the way he feels. About your wife, your mother. Everything is data. And that’s the mindset. And the biggest thing for me, the biggest issue has been getting everybody to think and recognize how vital data is in their life, and to be open to change. And we all know throughout go through this cycle of humanity, a lack of openness to change is what’s held humanity back. I seek to break that as well.

Kunal Agarwal:

I love that Mark. Switching gears, we spoke a little bit about developer productivity. We spoke about agility and data operations. Maersk obviously runs, like you were explaining, a lot of their data operations on the cloud. And as we see a lot of organizations when they start to get bigger and bigger and bigger in use cases on the cloud, cost becomes a front and center or a first-class citizen conversation to have. Shed some light on that for us. What is that maturity inside of Maersk, or how do you think about managing costs and budgets and forecast on the cloud, and what’s the consequence of not doing that correctly?

Mark Sear:

Well, there are some things that I can’t discuss because they’re obviously internal, but I think, let’s say I speak to a lot of people in a lot of companies, and there seem to be some themes that run everywhere, which is there’s a rush towards certain technologies, and people, they test it out on something tiny and say, “Hey, isn’t this amazing? Look how productive I am.” Then they get into production and somebody else says, “That’s really amazing. You were very productive. But have you seen what comes out the other end? It’s a cost, a bazillion dollars an hour to run it.” Then you’ve got this, I think they called it the Steve Jobs reality distortion field, where both sets of people go into this weird thing of, “Well, I’m producing value because I’m generating code and isn’t it amazing?” And the other side is saying, “Yeah, but physically the company’s going to spend all its money on the cloud. We won’t be able to do any other business.”

Kunal Agarwal:

Yeah.

Mark Sear:

So we are now getting to a stage where we have some really nice cost control mechanisms coming in. For me, it’s all in the audit. And crucial to this is do it upfront. Do it in your dev environment. Don’t go into production, get a giant bill and then say, how do I cut that bill? Which is again, where we’ve put Unravel now, right in the front of our development environment. So nothing even goes into production unless we know it’s going to work at the right cost price. Because otherwise, you’ve just invented the world’s best mechanism for closing the stable door after the cost horse has bolted, right?

Kunal Agarwal:

Right.

Mark Sear:

And then that’s always a pain because post-giant bill examinations are really paying, it’s a bit like medicine. I don’t know if you know, but in China, you only pay a doctor when you are well. As soon as you are sick, you stop paying bills and they have to take care of you. So that to me is how we need to look at cost.

Kunal Agarwal:

I love that. Love that analogy.

Mark Sear:

Do it upfront. Keep people well, don’t ever end up with a cost problem. So that’s again, part of the mindset. Get your data early, deal with it quickly. And that’s the level of maturity we are getting to now. It’s taking time to get there. We’re not the only people, I know it’s everywhere. But I would say to anybody, I was going to say lucky enough to be watching this, but that’s a little bit cocky, isn’t it? Anybody watching this? Whatever you do, get in there early, get your best practice in as early as possible. Go live with fully costed jobs. Don’t go live, work out what the job cost is and then go, how the hell do I cut it?

Kunal Agarwal:

Yeah.

Mark Sear:

Go live with fully costed jobs and work out well, if it costs this much in dev test, what’s it going to cost in prod? Then check it as soon as it goes live and say, yeah, okay, the delta’s right, game on. That’s it.

Kunal Agarwal:

So measure twice, cut once, and then you’re almost shifting left. So you’re leaving it for the data engineers to go and figure this out. So there’s a practice that’s emerging called FinOps, which is really a lot of these different groups of teams getting together to exactly solve this problem of understand what the cost is, optimize what the cost is, and then govern what the cost is as well. So who within your team does what I’m sure the audience would love to hear that a little bit.

Mark Sear:

Pretty much everybody will do everything, every individual data engineer, man, woman, child, whatever will be, but we’re not using child labor incidentally, that was.

Kunal Agarwal:

Yeah, let’s clarify that one for the audience.

Mark Sear:

That’s a joke. Edit that out. Every person will take it on themselves to do that because ultimately, I have a wider belief that every human being wants to do the right thing, given everything else being equal, they want to do the right thing. So I will say to the people that I speak to as data engineers, as new data engineers, I will say to them, we will empower you to create the best systems in the world. Only you can empower yourself to make them the most efficient systems in the world.

Kunal Agarwal:

Interesting.

Mark Sear:

And by giving it to them and saying, “This is a matter of personal pride, guys,” at the end of the day, am I going to look at every line of your code and say, “You wouldn’t have got away with that in my day.” Of course not. When I started in it, this is how depressingly sad it is. We had 16K of main memory on the main computer for a bank in an IBM mainframe, and you had to write out a form if you wanted 1K of disk. So I was in a similar program in those days. Now I’ve got a phone with God knows how much RAM on it.

Kunal Agarwal:

Right, and anybody can spin up a cloud environment.

Mark Sear:

Absolutely. I can push a button, spin up whatever I want.

Kunal Agarwal:

Right.

Mark Sear:

But I think the way to deal with this problem is to, again, push it left. Don’t have somebody charging in from finance waiving a giant bill saying, “Guys, you are costing a fortune.” Say to people, let’s just keep that finance dude or lady out of the picture. Take it on yourself, yourself. Show a bit of pride, develop this esprit de corps, and let’s do it together.

Kunal Agarwal:

Love it. Mark, last question. This is a fun one and I know you’re definitely going to have some fun answer over here. So what are your predictions for this data industry for this year and beyond? What are we going to see?

Mark Sear:

Wow, what do I think? Basically–

Kunal Agarwal:

Since you’ve got such a pulse on the overall industry and market.

Mark Sear:

So to me, the data industry, obviously it’ll continue to grow. I don’t believe that technology in many levels, I’ll give you over a couple of years, technology in many levels, we’re actually a fashion industry. If the fashion is to outsource, everybody outsource. So the fashion is to in-source, everybody does. Women’s skirts go up, fashion changes, they come down. Guys wear flared trousers, guy wears wear narrow trousers and nobody wants to be out of fashion. What I think’s going to happen is data is going to continue to scale, quantum computing will take off within a few years. What’s going to happen is your CEO is going to say, “Why have I got my data in the cloud and in really expensive data centers when someone has just said that I can put the whole of our organization on this and keep it in the top drawer of my desk?”

And you will have petabyte, zettabyte scale in something that can fit in a shoebox. And at that point it’ll change everything. I will probably either be dead, or at least hopefully retired and doing something by then. But I think it is for those people that are new to this industry, this is an industry that’s going to go forever. I personally hope I get to have an implant in my head at some point from Elon. I will be going for, I’m only going to go for version two. I’m not going for version one and hopefully–

Kunal Agarwal:

Yeah, you never want to go for V1.

Mark Sear:

Exactly, absolutely right. But, guys, ladies, everybody watching this, you are in the most exciting part, not just of technology, of humanity itself. I really believe that, of humanity itself, you can make a difference that very few people on the planet get to make.

Kunal Agarwal:

And on that note, I think the big theme that we have going on this series, we strongly feel that data teams are running the world and will continue to run the world. Mark, thank you so much for sharing this, exciting insights, and it’s always fun having you. Thanks you for making the time.

Mark Sear:

Complete pleasure.

The post Enabling Strong Engineering Practices at Maersk appeared first on Unravel.

Mastercard Reduces MTTR and Improves Query Processing with Unravel Data

Floyd Smith — Thu, 06 May 2021 07:22:20 +0000

Mastercard is one of the world’s top payment processing platforms, with more than 700 million cards in use worldwide. In the US, nearly 40% of American adults hold a Mastercard-branded card. And the company is going from strength to strength; despite a dip in valuation of more than a third when the pandemic hit, the company has doubled in value three times in the last nine years, recently reaching a market capitalization of more than $350B dollars.

The importance of cards has soared as a result of the pandemic. Cash use has declined sharply as less purchasing is done in person, and even in-person shopping has shifted toward cards, for reasons of hygiene. For Mastercard, keeping their back-end machinery running well has been vital, both to maintain business results and for consumer and payment network confidence. Mastercard is a strong user of artificial intelligence, in particular for fraud reduction, and all sorts of reporting, business intelligence support, and advanced analytics are necessary to meet business needs and regulatory requirements.

At DataOps Unleashed, Mastercard’s Chinmay Sagade, Principal Engineer, and Srinivasa Gajula, Big Data Engineer, described a specific use of Unravel Data to increase platform resiliency. Mastercard uses Unravel to reject potentially harmful workloads on Hadoop, which improves job quality over time and keeps the platform available for all users.

Ad Hoc Query Loads vs. Hadoop, Impala, Spark, and Hive

“They were able to see the platform resiliency and availability improved.” – Chinmay Sagade

Mastercard relies primarily on Hadoop for core big data needs, having first adopted the platform ten years ago. Their largest cluster has hundreds of nodes, and they have petabytes of data. Thousands of users access the platform, with much usage being ad hoc. Impala and Spark are widely used, with some Hive usage in the mix.

There are several problems that are common in big data setups like the one in use at Mastercard – exacerbated, in this case, by the sheer scale at which Mastercard operates:

These big data technologies are not easy for casual users to query correctly
Poorly structured queries cause big system impacts
Disks fill up, network pipes clog, and daemons are disabled, with unpredictable results
Impacts include application failures, system slowdowns, system crashes, and resource bottlenecks

Before the pandemic, Mastercard’s big data stack was already at capacity. Rogue jobs were not only failing, but also affecting other jobs. One Hive query ran for 24 hours, pushing out other users. The need was strong to improve operational effectiveness, reduce resource utilization, and make room for growth, without additional infrastructure cost.

As Chinmay Sagade describes it, there is a “butterfly effect” – that is, “Big data means that even a small problem can cause a large impact.” He describes the situation as “a recipe for disaster,” as productivity plummets and SLAs are not met. He even cites receding hairlines as an occupational hazard for stressed Hadoop administrators.

Unravel Data (and Smart Operators) to the Rescue

“We saw an immediate positive impact on the platform.” – Chinmay Sagade

At Mastercard, the problems with query processing became so serious that user and management trust in the platform was in decline. A new solution to these problems was sorely needed.

Initial use of Unravel Data proved fruitful. For instance, Unravel Data identified that more than a third of data tables stored across technologies were unused. Removing these tables freed up resources. Repeating this scan now takes minutes, with actionable results, where it previously took days, and produced unreliable results.

Unravel Data is now used for several layers of defense against rogue queries:

User and application owner self-tuning of their own query workloads
Automated monitoring to alert on “toxic” workloads in progress
Further monitoring to prevent the most hazardous workloads from running at all

Unravel helps improve resource usage, pre-empt many previous problems, and reduce mean time to remediation (MTTR) for the problems that remain.

Users who want to avoid problems can use Unravel Data to tune their own workloads. Their jobs then run faster, with far less chance of disruption, and they avoid automated alerts or even workload shutdowns.

Take the Unravel tour

Try Unravel for free

But users can be in a hurry. They may not know how to check their own workloads, or they may make mistakes despite the availability of checking. The Mastercard data team needed another layer of defense.

Using Runtime Data to Manage Outliers

“We will ask them to take the necessary actions, like tuning the quarry and resubmitting again.” – Srinivasa Gajula

Mastercard took additional steps to monitor and act on toxic workloads. They created a Python-based framework which collects application data at runtime. Anomaly detection algorithms scan relevant metrics and flag toxic workloads. All of this connects to Unravel.

“We use the Impala and Yarn APIs to collect metrics, along with HDFS metadata,” says Srinivasa Gajula. They produce summary reports to note the number and percentage of workloads that fail with out-of-memory errors, syntax errors, and other causes. They detect excessive numbers of small files and calculate both mean time between failures (MTBF) and mean time to repair (MTTR). This information is shared with users and application owners, helping them to make proper use of the platform.

They also detect different types of joins and identify, as the join proceeds, whether it’s likely to make excessive demands on the system. When a user provides compute stats to Impala, for instance, then Impala can identify whether specific tables should be broadcasted, or shuffled, and how to filter data for optimal performance. And users can provide hints in the query to, for example, broadcast a small table, or shuffle a larger one.

But many users run their queries without providing this helpful information. Impala may then broadcast a large table, for example, causing a performance slowdown or even a crash.

So Mastercard now identifies these issues as they begin to occur. They build a tree from operator dependencies and predict whether a large table, for instance, is likely to be broadcast. If so, the user is asked to tune the query, and submit it again.

They can even identify whether a particular query is CPU-bound or I/O bound. Where a cross join, for instance, is causing the number of rows produced to grow exponentially, in a way that is likely to cause performance issues, or even stability problems for the platform. They can alert the user or, in extreme cases, kill the query.

Unravel is now part of the software development life cycle (SDLC) process at Mastercard. Application quality increases up front, and the ability to fix remaining problems in production is greatly improved as well.

Get answers, not more charts and graphs

Try Unravel for free

Business Impacts of Pre-Checking and Pre-Emption

“Now administrators can spend their time in value-added activities.” – Chinmay Sagade

Mastercard has racked up many benefits by empowering users to check their own queries and pre-empting the remainder that are not “fixed” and are still problematic:

Less time spent troubleshooting
Greater reliability
Resources not over-allocated, so resources are freed up
Infrastructure costs reduced through appropriate use or resources

Not all of this has been easy. Users needed plenty of notice and detailed documentation. And they can only be expected to learn so much about how to right-size their own queries. But users have actually supported restrictions on unbalanced jobs, as they see the benefits of better query performance and a more reliable platform for everyone.

This blog post is a good starting point, but it’s worth taking the time to watch the Mastercard presentation yourself. And you can view all the videos from DataOps Unleashed here. You can also download The Unravel Guide to DataOps, which was made available for the first time during the conference.

The post Mastercard Reduces MTTR and Improves Query Processing with Unravel Data appeared first on Unravel.

How DBS Bank Leverages Unravel Data

Unravel Data — Wed, 13 Jan 2021 21:27:44 +0000

The post How DBS Bank Leverages Unravel Data appeared first on Unravel.

Credit Suisse AG Names Unravel Data A Disruptive Tech Winner

Unravel Data — Tue, 12 Jan 2021 16:18:20 +0000

Palo Alto, CA — January 12, 2021 — Credit Suisse today named five winners of its 2021 Disruptive Technology Recognition (DTR) Program, an annual program that highlights some of the best disruptors of traditional enterprise information technology. The program gives a chance for participants to collaborate to promote innovation at the bank and partner firms.

This is the third year of the program, which allows Credit Suisse to exchange ideas and philosophies with companies leading technology changes that will disrupt the existing framework and shape the future for businesses across the spectrum.

“We saw last year’s winners have a significant impact across industry verticals, and we expect this year’s nominees to follow in these big footsteps. We have high expectations for the new award nominees as we take the DTR program into 2021 and are excited to see how they will raise the bar for the future of enterprise technology,” said Laura Barrowman, Group Information Officer for Credit Suisse.

Credit Suisse partners with technology companies in all stages of business. The bank has found the DTR Program helps foster a closer partnership with companies to actively drive change through advances in enterprise technology.

This year’s DTR Program partners are:

AttackIQ equips cybersecurity teams with a Security Optimization Platform for automating security control validation, improving security program effectiveness, and using insights to make better decisions.

Immuta provides an Automated Data Governance platform that powers compliant BI, analytics and data science for data-driven organizations by automating data access control and privacy protections.

MURAL is a digital workspace for visual collaboration, helping enterprise teams imagine together from anywhere to unlock new ideas, solve hard problems, and innovate faster using its inclusive, simple-to-use platform.

Unravel Data radically simplifies DataOps by providing unified observability and AI-enabled operations across the modern data stack, allowing data teams to run their data pipelines reliably and cost-effectively in all cloud environments.

WalkMe is a code-free software that enables organizations to measure, drive, and act to ultimately maximize the impact of their digital transformation and accelerate the return on their software investment.

About Unravel Data
Unravel Data radically transforms the way businesses understand and optimize the performance and cost of their modern data applications – and the complex data pipelines that power those applications. Providing a unified view across the entire data stack, Unravel’s market-leading data observability platform leverages AI, machine learning, and advanced analytics to provide modern data teams with the actionable recommendations they need to turn data into insights. Some of the world’s most recognized brands like Adobe, 84.51˚ (a Kroger company), and Deutsche Bank rely on Unravel Data to unlock data-driven insights and deliver new innovations to market. To learn more, visit https://www.unraveldata.com.

Media Contact
Blair Moreland
ZAG Communications for Unravel Data
unraveldata@zagcommunications.com

The post Credit Suisse AG Names Unravel Data A Disruptive Tech Winner appeared first on Unravel.

Case Study: Meeting SLAs for Data Pipelines on Amazon EMR

George Demarest — Thu, 30 May 2019 20:44:44 +0000

A household name in global media analytics – let’s call them MTI – is using Unravel to support their data operations (DataOps) on Amazon EMR to establish and protect their internal service level agreements (SLAs) and get the most out of their Spark applications and pipelines. MTI runs 10’s of thousands of jobs per week, about 70% of which are Spark, with the remaining 30% of workloads running on Hadoop, or more specifically Hive/MapReduce.

Among the most common complaints and concerns about optimizing big data clusters and applications is the amount of time it takes to root-cause issues like application failures or slowdowns or to figure out what needs to be done to improve performance. Without context, performance and utilization metrics from the underlying data platform and the Spark processing engine can laborious to collect and correlate, and difficult to interpret.

Unravel employs a frictionless method of collecting relevant data about the full data stack, running applications, cluster resources, datasets, users, business units and projects. Unravel then aggregates and correlates this data into the Unravel data model and then applies a variety of analytical techniques to put that data into a useful context.

Unravel architecture for Amazon AWS/EMR

MTI has prioritized their goals for big data based on two main dimensions that are reflected in the Unravel product architecture: Operations and Applications.

Optimizing data operations

For MTI’s cluster level SLAs and operational goals for their big data program, they identified the following requirements:

Reduce time needed for troubleshooting and resolving issues.
Improve cluster efficiency and performance.
Improve visibility into cluster workloads.
Provide usage analysis

Reducing time to identify and resolve issues

One of the most basic requirements for creating meaningful SLAs is to set goals for identifying problems or failures – known as Mean Time to Identification (MTTI) – and the resolution of those problems – known as Mean Time to Resolve (MTTR). MTI executives set a goal of 40% reduction in MTTR.

One of the most basic ways that Unravel helps reduce MTTI/MTTR is through the elimination of the time-consuming steps of data collection and correlation. Unravel collects granular cluster and application-specific runtime information, as well as metrics on infrastructure, resources using native Hadoop APIs and via lightweight sensors that only send data while an application is executing. This alone can save data teams hours – if not days – of data collection by, capturing application and system log data, configuration parameters, and other relevant data.

Once that data is collected, the manual process of evaluating and interpreting that data has just begun. You may spend hours charting log data from your Spark application only to find that some small human error, a missed configuration parameter, and incorrectly sized container, or a rogue stage of your Spark application is bringing your cluster to its knees.

Unravel top level operations dashboard

Improving visibility into cluster operations

In order for MTI to establish and maintain their SLAs, they needed to troubleshoot cluster-level issues as well as issues at the application and user levels. For example, MTI wanted to monitor and analyze the top applications by duration, resources usage, I/O, etc. Unravel provides a solution to all of these requirements.

Cluster level reporting

Cluster level reporting and drill down to individual nodes, jobs, queues, and more is a basic feature of Unravel.Unravel cluster infrastructure dashboard

Application and workflow tagging

Unravel provides rich functionality for monitoring applications and users in the cluster. Unravel provides cluster and application reporting by user, queue, application type and custom tags like Project, Department etc.. These tags are preconfigured so that MTI can instantly filter their view by these tags. The ability to add custom tags is unique to Unravel and enables customers to tag various applications based on custom rules specific to their business requirements (e.g. Project, business unit, etc.).

Unravel application tagging by department

Usage analysis and capacity planning

MTI wants to be able to maintain service levels over the long term, and thus require reporting on cluster resource usage, and data on future capacity requirements for their program. Unravel provides this type of intelligence through the Chargeback/showback reporting.

Unravel chargeback reporting

You can generate ChargeBack reports in Unravel for multi-tenant cluster usage costs associated by the Group By options: application type, user, queue, and tags. The window is divided into three (3) sections,

Donut graphs showing the top results for the Group by selection.
Chargeback report showing costs, sorted by the Group By choice(s).
List of Yarn applications running.

Unravel chargeback reporting

Improving cluster efficiency and performance

MTI wanted to be able to predict and anticipate application slowdowns and failures before they occur. by using Unravel’s proactive alerting and auto-actions so that they could, for example, find runaway queries and rogue jobs, detect resource contention, and then take action.

Unravel Auto-actions and alerting

Unravel Auto-actions are one of the big points of differentiation over the various monitoring options available to data teams such as Cloudera Manager, Splunk, Ambari, and Dynatrace. Unravel users can determine what action to take depending on policy-based controls that they have defined.

Unravel Auto-actions set up

The simplicity of the Auto-actions screen belies the deep automation and functionality of autonomous remediation of application slowdowns and failures. At the highest level, Unravel Auto-actions can be quickly set up to alert your team via email, PagerDuty, Slack or text message. Offending jobs can also be killed or moved to a different queue. Unravel can also create an HTTP post that gives users a lot of powerful options.

Unravel also provide a number of powerful pre-built Auto-action templates that can give users a big head start on crafting the precise automation they wish for their environment.

Preconfigured Unravel auto-action templates

Applications

Turning to MTI’s application-level requirements, the company was looking at improving overall visibility into their data application runtime performance, and to encourage a self-service approach to tuning and optimizing their Spark applications.

Increased visibility into application runtime and trends

MTI data teams, like many, are looking for that elusive “single pane of glass” for troubleshooting slow and failing Spark jobs and applications. They were looking to:

Visualize app performance trends, viewing metrics such as applications start time, duration, state, I/O, memory usage, etc.
View application component (pipeline stages) breakdown and their associated performance metrics
Understand execution of Map Reduce jobs, Spark applications and the degree of parallelism and resource usage as well as obtain insights and recommendations for optimal performance and efficiency

Because typical data pipelines are built on a collection of distributed processing engines (Spark, Hadoop, et al.), getting visibility into the complete data pipeline is a challenge. Each individual processing engine may have monitoring capabilities, but there is a need to have a unified view to monitor and manage all the components together.

Unravel monitoring, tuning and troubleshooting

Intuitive drill-down from Spark application list to an individual data pipeline stage

Unravel was designed with an end-to-end perspective on data pipelines. The basic navigation moves from the top level list of applications to drill down to jobs, and further drill down to individual stages of a Spark, Hive, MapReduce or Impala applications.

Unravel Gantt chart view of a Hive query

Unravel provides a number of intuitive navigational and reporting elements in the user interface including a Gantt chart of application components to understand the execution and parallelism of your applications.

Unravel self-service optimization of Spark applications

MTI has placed an emphasis on creating a self-service approach to monitoring, tuning, and management of their data application portfolio. They are for development teams to reduce their dependency on IT and at the same time to improve collaboration with their peers. Their targets in this area include:

Reducing troubleshooting and resolution time by providing self-service tuning
Improving application efficiency and performance with minimal IT intervention
Provide Spark developers performance issues and relate directly to the lines of code associated with a given step.

MTI has chosen Unravel as a foundational element of their self-service application and workflow improvements, especially taking advantage of application recommendations and insights for Spark developers.

Unravel self-service capabilities

Unravel provides plain language insights as well as specific, actionable recommendations to improve performance and efficiency. In addition to these recommendations and insights, users can take action via the auto-tune function, which is available to run from the events panel.

Unravel provides intelligent recommendations and insights as well as auto-tuning.

Optimizing Application Resource Efficiency

In large scale data operations, the resource efficiency of the entire cluster is directly linked to the efficient use of cluster resources at the application level. As data teams can routinely run hundreds or thousands of job per day, an overall increase in resource efficiency across all workloads improves the performance, scalability and cost of operation of the cluster.

Unravel provides a rich catalog of insights and recommendations around resource consumption at the application level. To eliminate resource wastage Unravel can help you run your data applications more efficiently by providing you AI driven insights and recommendations to do show:

Unravel Insight: Under-utilization of container resources, CPU or memory

Unravel Insight: Too few partitions with respect to available parallelism

Unravel Insight: Mapper/Reducers requesting too much memory

Unravel Insight: Too many map tasks and/or too many reduce tasks

Solution Highlights

Work on all of these operational goals is ongoing with MTI and Unravel, but to date, they have made significant progress on both operational and application goals. After running for over a month on their production computation cluster, MTI were able to capture metrics for all MapReduce and Spark jobs that were executed.

MTI also got great insights on the number and causes of inefficiently running applications. Unravel detected a significant number of inefficient applications. Unravel detected 38,190 events after analyzing 30,378 MapReduce jobs that they executed. They were also able to detect 44,176 events for 21,799 Spark jobs that they executed. They were also able to detect resource contention which causing Spark jobs to get stuck in “Accepted” state, rather than running to completion.

During a deep dive on their applications, MTI found multiple inefficient jobs where Unravel provided recommendations for repartitioning the data. They were also able to Identify many jobs which waste CPU and memory resources.

The post Case Study: Meeting SLAs for Data Pipelines on Amazon EMR appeared first on Unravel.