Watch the Low-Code/No-Code Summit on-demand sessions to learn how to successfully innovate and achieve efficiencies by upskilling and scaling citizen developers. Watch now.
Data is critical to business success in today’s world, and a solid data governance foundation is key to capitalizing on growth opportunities. But one of the biggest challenges data professionals face is fully understanding their organizations’ complex data parks.
Most companies like to apply advanced analytics and machine learning to generate insights from data at scale. Yet they also struggle to modernize their data ecosystems. Too often, data is stored all over legacy systems. Or it’s too hard to find in tech stacks cobbled together through years of acquisitions.
A recent Forrester study commissioned by Capital One confirmed these challenges in seeing, understanding and using data. In a survey of data management decision makers, nearly 80% cited a lack of data catalog as their top challenge. Nearly 75% saw a lack of data observability as a problem.
In data management, out of sight is out of mind
Data that is out of sight does not generate value for your organization. That is why it is so important to take data out of the darkness and make it more visible and usable. For example, cataloging data plays a vital role in understanding data, its use and ownership. When data professionals take more holistic approaches to cataloging, observability, and management, they can better unlock the value of the data to improve business outcomes.
Hundreds of companies offer different options in terms of data catalog, data quality, ETL, data loading and classification. We don’t need more disruption here. We need simplification. The pain point is the complexity that data analysts and engineers face when performing specific tasks, such as publishing, finding, or trusting a dataset. Right now, that might mean going through multiple tools owned by different teams with their own required approvals.
We need a simplified experience layer so that users only have to answer a few questions and then the data is published without any backend integration. If that experience can be seamless and compliant with policies, working with data won’t be a burden. There will be all sorts of great experiences, including faster time to market and less duplication of effort across the organization.
Achieving this future state requires discipline, targeted investment and buy-in from above. Yet companies have a range of tools and approaches at their disposal to achieve a well-managed data park that has real business impact and scales as data sources and products grow.
For most data leaders, the first step is to migrate to the cloud. Gartner predictions end-user spending in the cloud will reach $600 billion next year, up from nearly $411 billion in 2021. Businesses know they can do much more with their data in the cloud, and it can relieve the pressure of centralized teams managing the most critical components of your on-premises data. Moving to the cloud can help overcome data bottlenecks, but the cloud also vastly increases the variety of data coming in, from many more sources, increasing the need to analyze it quickly. Now you’re back in a bottleneck situation and risk increasing tensions between the central IT and business teams.
A model that I support is bundling data management to the business units, with a central tool to manage costs and risks. You can let business teams work at their own pace, while the central shared services team ensures that the platform is well managed and highly observable.
It’s important to consider the different ways business teams produce and use data. You need to build flexibility into the tools. If you don’t, you risk these teams finding another channel to do the work. When that happens, you lose visibility and can’t guarantee that all company teams are following governance policies. A federated data approach with centralized tooling and policies avoids overly centralized control, without decentralizing everything to the point where you run the potential for cost overruns and data security risks.
Pooling the data also provides data producers, consumers, risk managers and underlying platform teams with a single source of truth. That’s where that simplification layer comes in again: having one place where data analysts and scientists know they can find what they need. Everyone has the same UI layer, tooling, and policies, so they know they’re publishing according to guidelines.
Finally, make sure your data analysts and scientists have a clear “production” path from the sandbox environment in which they did their work. If something important comes out of their analytics, you need to give them an easy way to wrap that work in the right data governance policies as it goes into production. Otherwise, you could end up with shadowy, unmanaged pseudo-production datasets running in unstable environments.
Data is power, but it comes with great responsibility. Building data trust through greater visibility, consistency, and platform simplification is a necessary foundation for creating the modern data ecosystem.
Salim Syed is vice president and chief engineering officer at Capital One Software.
Data decision makers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.
To read about advanced ideas and up-to-date information, best practices and the future of data and data technology, join DataDecisionMakers.
You might even consider contributing an article yourself!
Read more from DataDecisionMakers