Work Without Fear Or Heroism

The Seven Steps of Implementing DataOps — Step 7

DataKitchen
Published in
5 min readMay 31, 2017

--

Many data analytics professionals live in fear. In data analytics there are two common ways to be professionally embarrassed (or get fired):

  • Allow poor quality data to reach users
  • Deploy changes that break production systems

Data engineers, scientists and analysts spend an excessive amount of time and energy working to avoid these disastrous scenarios. They work weekends. They do a lot of hoping and praying. They devise creative ways to avoid overcommitting. The problem is that heroic efforts are eventually overcome by circumstances. Without the right controls in place, a problem will slip through and bring the company’s critical analytics to a halt.

The DataOps enterprise puts the right set of tools and processes in place to enable data and new analytics to be deployed with a high level of quality. When an organization implements DataOps, engineers, scientists and analysts can relax because quality is assured. They can work without fear. DataOps accomplishes this by optimizing two key workflows.

The Value Pipeline

Data analytics seeks to extract value from data. We call this the Value Pipeline. The diagram below shows the Value Pipeline progressing horizontally from left to right. Data enters the pipeline and moves into production processing. Production is generally a series of stages: access, transforms, models, visualizations, and reports. When data exits the pipeline, in the form of useful analytics, value is created for the organization. DataOps utilizes toolchain workflow automation to optimize operational efficiency in the Value Pipeline. Data in the Value Pipeline is updated on a continuous basis, but code is kept constant. Step 2 in the seven steps of implementing DataOps — using version control — serves as the foundation for controlling the code deployed.

As mentioned above, the worst possible outcome is for poor quality data to enter the Value Pipeline. DataOps prevents this by implementing data tests (step 1). Inspired by the statistical process controls in a manufacturing workflow, data tests ensure that data values lay within an acceptable statistical range. Data tests validate data values at the inputs and outputs of each processing stage in the pipeline. For example, a US phone number should be ten digits. Any other value is incorrect or requires normalization.

Once data tests are in place, they work 24x7 to guarantee the integrity of the Value Pipeline. Quality becomes literally built in. If anomalous data flows through the pipeline, the data tests catch it and take action — in most cases this means firing off an alert to the data analytics team who can then investigate. The tests can even, in the spirit of auto manufacturing, “stop the line.” Statistical process controls eliminate the need to worry about what might happen. With the right data tests in place, the data analytics team can work without fear. This frees DataOps engineers to focus on their other major responsibility — the Innovation Pipeline.

The Value-Innovation Pipeline

The Innovation Pipeline

The Innovation Pipeline seeks to improve analytics by implementing new ideas that yield analytic insights. As the diagram illustrates, a new feature undergoes development before it can be deployed to production systems. The Innovation Pipeline creates a feedback loop. Innovation spurs new questions and ideas for enhanced analytics. This requires more development leading to additional insight and innovation. During the development of new features, code changes, but data is kept constant. Keeping data static prevents changes in data from being falsely attributed to the impact of the new algorithm. A fixed data set can be set-up when creating a development environment — step 4 in the seven steps of implementing DataOps.

DataOps implements continuous deployment of new ideas by automating the workflow for building and deploying new analytics. It reduces the overall cycle time of turning ideas into innovation. While doing this, the development team must avoid introducing new analytics that break production. The DataOps enterprise uses logic tests (step 1) to validate new code before it is deployed. Logic tests ensure that data matches business assumptions. For example, a field that identifies a customer should match an existing entry in a customer dimension table. A mismatch should trigger some type of follow-up.

With logic tests in place, the development pipeline can be automated for continuous deployment, simplifying the release of new enhancements and enabling the data analytics team to focus on the next valuable feature. With DataOps the dev team can deploy without worrying about breaking the production systems — they can work without fear. This is a key characteristic of a fulfilled, productive team.

The Value-Innovation Pipeline

In real world data analytics, the Value Pipeline and Innovation Pipeline are not separate. The same team is often responsible for both. The same assets are leveraged. Events in one affect the other. The two workflows are shown combined into the Value-Innovation Pipeline in the figure. The Value-Innovation Pipeline captures the interplay between development and production and between data and code. DataOps breaks down this barrier so that cycle time, quality and creativity can all be improved. The DataOps enterprise masters the orchestration of data to production and the deployment of new features both while maintaining impeccable quality. Reduced cycle time enables DataOps engineers to impact the organization in highly visible ways. Improved quality enables the team to move forward with confidence. DataOps speeds the extraction of value from data and improves the velocity of new development while ensuring the quality of data and code in production systems. With confidence in the Value-Innovation pipeline that stems from DataOps, the data analytics team avoids the anxiety and over-caution that characterizes a non-DataOps enterprise. Work Without Fear!

This concludes our series on the seven steps to implement DataOps. Our discussion of DataOps began with an explanation of how DataOps incorporates Agile Software Development, DevOps, and manufacturing-based statistical process control into analytics and data management. If you joined the discussion midway, you may be surprised to learn that an analytics team can migrate to DataOps in seven simple steps. Click here to return to step one.

For more information on the Seven Steps, please download our white paper “Seven Steps to Implement DataOps.”

--

--