Blog

What is DataOps and Why It Matters

Jul 9, 2020 | by Polina Reshetova

Let’s start by understanding where DataOps falls in the line-up of current IT methodologies. DataOps is the next level up from ETL (extract, transform, and load) and MDM (master data management systems) in terms of organizing data and processes. It can also be thought of as a methodology that combines DevOps and Agile within the field of data science. Like these two principles, DataOps is focused on changing the way people work and how they approach everyday challenges.

In this article, we’ll explore what DataOps is and how businesses can leverage it for their organization.

What issues does DataOps address?

One of the greatest challenges in enterprise IT is data ownership. In legacy enterprise systems each department has its own pipeline, analysis, and methodologies for procuring datasets. This ownership lacks transparency and causes data silos and data quality issues. This is further complicated by the fact that each department interprets each dataset and results in its own way. This lack of a centralized and unified way to share datasets across the organization creates dysfunction and discourages collaboration.

Consider a real-life scenario. In the retail sector, many stores operate membership or loyalty programs. These stores can then use this data to group and track purchases made at different stores, on different days, by a single member – known as “purchase grouping.” However, a member may be defined differently, and this can have a profound effect on the interpretation of the data and subsequent business decisions. For example, is that single member one person? Oftentimes, couples may share a membership. In these instances, how does the business market to these unique individuals? If that same member makes an online purchase with the retailer yet these purchases can be made without the member logging in, how can a marketing organization accurately track purchase patterns?

These data-driven decisions are important because they inform and impact the retailer’s marketing strategy and plans. Typically, in a pre-DataOps organization, such decisions are made by independent departments who don’t have the tools or workflows for collaboration. This results in a lack of data transparency and unnecessary barriers that can make for uninformed strategic business decisions.

DataOps treats data as an organization-wide asset

Confronted with these challenges, DataOps suggests a different approach. One that treats data as a standalone resource: an asset that can be leveraged by the entire organization. Each department can access this resource and share the same tools and data storage environment. Importantly, they can also share findings, discoveries, and insights in a unified way on a centralized DataOps platform.

DataOps doesn’t just happen, it requires buy-in and commitment from across the organization. These changes can be challenging for some organizations to implement, especially if the business and data environment is evolved and complex.

One of the most difficult hurdles, both to recognize and to implement, is the creation of a common data schema. A data schema defines data entities within the enterprise and identifies a method to locate and work with each entity.

Developing a data schema involves a significant collaboration effort, particularly for enterprises. Once developed, the schema is constantly evolving as the business develops and changes.

Leverage Agile

Agile is a widely recognized software development practice that enables this form of intense collaboration and quick changes. As new business challenges emerge, departments can extend the core schema by choosing and maintaining their own pipelines and entity definitions. It’s important that a process is put in place to determine which of these entities and pipelines becomes part of the common data schema and at what time. Agile is ideal for this since it provides control over the process, supports short development cycles, and quick idea implementations.

If DataOps brings together DevOps and Agile, where does DevOps fit into this process?

Add DevOps for ease of collaboration around a single data archive

DevOps is the mechanism for enabling a centralized data archive. One with variable access points that allows individual departments to plug in custom solutions that support a high volume of requests. DevOps also brings value in its ability to build and manage a system of solutions for a variety of skills sets to collaborate; including IT engineers, non-technical users, and those in between such as data analysts and data scientists. This flexibility makes DevOps the practice of choice for building a modern data management system that can handle complex, high volume, and high velocity data.

For a business to successfully and widely employ machine learning algorithms and artificial intelligence systems to support business-critical decisions, the implementation of a DataOps methodology, or some part of it, is essential. As data grows in volume and diversity, businesses need to both manage this data and successfully and seamlessly enable data integration into their daily business operations.

The value of a DataOps methodology is clear. Why isn’t it catching on?

DataOps, like its relation DevOps, is grounded in changing mindsets. This is not easy, particularly if the organization is a large enterprise where legacy data management practices are embedded. For small organizations, DataOps and upgrading to the modern data stack can seem like overkill. Why take it on when you can appoint a data owner for each data set and simply meet regularly to collaborate and share data insights. The thing is, projects can scale fast. Staff may change roles or leave the company and critical data may be lost or ignored.

Putting in place a plan to embrace DataOps isn’t contingent on the size of the organization. Every business should be thinking about how they plan to prepare for a growth in the volume and complexity of data so they avoid falling victim to silos of data and practice debt.

Learn more about DataOps.