Continuous Delivery Is A Risk Management Strategy

CI/CD stands for Continuous Integration and Continuous Deployment (or Delivery) started to be the new “agile” of the 2010s. Well defined by the pioneers, it started to loosen its meaning as more and more people adjusted it to its liking.

I often see it touted as a way to deliver software faster or as an automation exercise. I don’t think it helps with either, but it’s still worth doing.

Deployment versus Delivery

Although used interchangeably, I perceive a drastic difference between deployment and delivery.

A deployment is an act of putting an artifact (usually a new software) into production, presumably to make it available to customers.

Delivery is an act of satisfying customer needs, in this context presumably through software.

When done well, those two aspects are independent of each other. While delivery may have some previous deployments as a requirement, they do not have to be done at the same time. You can do a deployment of a code that contains functionality not available to customers and is hence not delivered.

The Iterations

Continuous integration is originally defined as a practice of merging code into the mainline branch often. For the purpose of modern development, I’d add on top of that:

Continuous Integration is a system for fastest possible code convergence. As part of it, after every push into the central repository, all feasible checks are applied as if the code was meant to be a production version.

There is a single branch in the version control system that is a target for all pull requests by default. That branch can be deployed to production at any time.

Some teams interpret this as “master is automatically deployed to production” but I'm against it, actually. While the goal of the CI/CD culture is to be able to deploy at any time and have all safeguards in place, every deployment is a risk and I insist that at the failover time the deployer must be available for immediate corrective action should anything go wrong. With nontrivial productions and build pipelines the time from push to production often crosses the threshold of “push and go home” which can lead to nasty aftermaths.

My definition of “continuous delivery” is

There is a conscious effort to deliver the smallest increments of functionality possible to the user, in the shortest time frame possible.

The core motivation for this is to verify that we are delivering the right thing. While often coupled with split testing, gradual rollouts, or drip campaigns, those are just means to the end.

The Risk Management

As said, I disagree with the argument that continuous delivery and continuous deployment (further referred to as D&D) make you faster at delivering software. Preparing iterations takes time—a lot of time. It is not just the time and effort spent deploying; thinking through usability of every iteration is not trivial. Testing every iteration is not trivial. If you do split testing of multiple features, the resulting range of all possible combinations can be overwhelming.

Yet it’s worth it. The main benefit of continuous delivery is superb risk management. It is excellent at managing both technical and product risk.

From the perspective of technical risks, I do strongly believe it’s not possible to test the software in its entirety before deployment. Most of the problems I saw were not caused by the code: they were either caused by configuration or production data. The problem is only made worse when the software is composed of multiple inter-dependent services.

Having small increments for a deployment helps significantly with the identification of failures and thus a transition to roll-forward instead of rollback strategy. In addition, as it leads to a high volume of deployment traffic, it introduces a self-correcting culture around the reliability and thoroughness of the test suite¹ as well as maintenance of reasonable build times.

From the product risk perspective, small increments help with product verification and avoiding over-investment. Actually giving a feature into customer’s hands and measuring results means knowing what they want as opposed to what they say they want.

This also allows short iterations of development-customer verification-development-customer verification. During the verification phase, the team can move on to something else and you may discover that contrary to initial expectations, what you delivered is enough. Gold plating can be left for another time.

When Not To Use

For 99% of use cases, to me, D&D was totally worth the initial investment. Yet as always, those are the right tools for the right use cases. I’d be cautious if I would be

Shipping software to customer premises to be run by themselves
Shipping in conditions where sufficient monitoring is not possible, limiting the visibility into production and the feedback from it
Completely, absolutely, 100% sure about what I am developing
I wouldn’t care much about the result. I don’t mean it in a derogatory fashion: this can be the fate of “checklist features” developed only to pass a random requirement in a random contract (or analyst quadrant), but nobody actually expects the feature to be used
Doing the certain type of infrastructure changes where i.e. the cost of a parallel run for database/cloud/software migrations exceeds the risk of a downtime
Unable to have a correct culture, like being forced to use a subcontracted body shop leasing 50 random typists who don’t give that much of a damn if they send your company underwater

That said, quite a few members of my team told me that when they have been interviewing they haven’t believed that D&D can actually work. It usually took them 1-3 months to adjust and I don’t know of anyone who wanted to go back since. On the contrary: we attributed a lot of problems to deviating from the D&D prescription, even though we thought it’s for a good reason.

Thanks to Daria Grudzien and Vincenzo Chianese for feedback and comments.

At least with the right motivations. For my engineering, the default guideline was that “if someone else deployed a change that broke your product, it's the fault of your tests”. ↩︎

Published June 8, 2020 in Essays and tagged CTO • product management • project management • sdlc • startups