I Prefer This Over That

A couple weeks ago I tweeted:

Apparently it resonated. I think that’s more retweets than anything else original I’ve said on Twitter in my seven years on the platform. (SEVEN years? Holy snack-sized sound bytes! But I digress.)

@jonathandart said, “I would love to read a fleshed out version of that tweet.”

OK, here you go.

First, a little background. Since I worked on Cloud Foundry at Pivotal for a couple years, I’ve been living the DevOps life. My days were filled with zero-downtime deployments, monitoring, configuration as code, and a deep antipathy for snowflakes. We honed our practices around deployment checklists, incident response, and no-blame post mortems.

It is within that context that I came to appreciate these four simple statements.

Recovery over Perfection

Something will go wrong. Software might behave differently with real production data or traffic than you could possibly have imagined. AWS could have an outage. Humans, being fallible, might publish secret credentials in public places. A new security vulnerability may come to light (oh hai, Heartbleed).

If we aim for perfection, we’ll be too afraid to deploy. We’ll delay deploying while we attempt to test all the things (and fail anyway because ‘all the things’ is an infinite set). Lowering the frequency with which we deploy in order to attempt perfection will ironically increase the odds of failure: we’ll have fewer turns of the crank and thus fewer opportunities to learn, so we’ll be even farther from perfect.

Perfect is indeed the enemy of good. Striving for perfection creates brittle systems.

So rather than strive for perfection, I prefer to have a Plan B. What happens if the deployment fails? Make sure we can roll back. What happens if the software exhibits bad behavior? Make sure we can update it quickly.

Predictability over Commitment

Surely you have seen at least one case where estimates were interpreted as a commitment, and a team was then pressured to deliver a fixed scope in fixed time.

Some even think such commitments light a fire under the team. They give everyone something to strive for.

It’s a trap.

Any interesting, innovative, and even slightly complex development effort will encounter unforeseen obstacles. Surprises will crop up that affect our ability to deliver. If those surprises threaten our ability to meet our commitments, we have to make painful tradeoffs: Do we live up to our commitment and sacrifice something else, like quality? Or do we break our commitment? The very notion of commitment means we probably take the tradeoff. We made a commitment, after all. Broken commitments are a sign of failure.

Commitment thus trumps sustainability. It leads to mounting technical debt. Some number of years later find themselves constantly firefighting and unable to make any progress.

The real problem with commitments is that they suggest that achieving a given goal is more important than positioning ourselves for ongoing success. It is not enough to deliver on this one thing. With each delivery, we need to improve our position to deliver in the future.

So rather than committing in the face of the unknown, I prefer to use historical information and systems that create visibility to predict outcomes. That means having a backlog that represents a single stream of work, and using velocity to enable us to predict when a given story will land. When we’re surprised by the need for additional work, we put that work in the backlog and see the implications. If we don’t like the result, we make an explicit decision to tradeoff scope and time instead of cutting corners to make a commitment.

Aiming for predictability instead of commitment allows us to adapt when we discover that our assumptions were not realistic. There is no failure, there is only learning.

Safety Nets over Change Control

If you want to prevent a given set of changes from breaking your system, you can either put in place practices to tightly control the nature of the changes, or you can make it safer to change things.

Controlling the changes typically means having mechanisms to accept or reject proposed changes: change control boards, review cycles, quality gates.

Such systems may be intended to mitigate risk, but they do so by making change more expensive. The people making changes have to navigate through the labyrinth of these control systems to deliver their work. More expensive change means less change means less risk. Unless the real risk to your business is a slogging pace of innovation in a rapidly changing market.

Thus rather than building up control systems that prevent change, I’d rather find ways to make change safe. One way is to ensure recoverability. Recovery over perfection, after all.

Fast feedback cycles make change safe too. So instead of a review board, I’d rather have CI to tell us when the system is violating expectations. And instead of a laborious code review process, I’d rather have a pair work with me in real time.

If you want to keep the status quo, change control is fine. But if you want to go fast, find ways to make change cheap and safe.

Collaboration over Handoffs

In traditional processes there are typically a variety of points where one group hands off work to another. Developers hand off to other developers, to QA for test, to Release Engineering to deliver, or to Ops to deploy. Such handoffs typically involve checklists and documentation.

But the written word cannot convey the richness of a conversation. Things will be missed. And then there will be a back and forth.

“You didn’t document foo.”
“Yes, we did. See section 3.5.1.”
“I read that. It doesn’t give me the information I need.”

The next thing you know it’s been 3 weeks and the project is stalled.

We imagine a proper handoff to be an efficient use of everyone’s time, but they’re risky. Too much can go wrong, and when it does progress stops.

Instead of throwing a set of deliverables at the next team down the line, bring people together. Embed testers in the development team. Have members of the development team rotate through Ops to help with deployment and operation for a period of time. It actually takes less time to work together than it does to create sufficient documentation to achieve a perfect handoff.

True Responsiveness over the Illusion of Control

Ultimately all these statements are about creating responsive systems.

When we design processes that attempt to corral reality into a neat little box, we set ourselves up for failure. Such systems are brittle. We may feel in control, but it’s an illusion. The real world is not constrained by our imagined boundaries. There are surprises just around the corner.

We can’t control the surprises. But we can be ready for them.

Where Have I Been?

Oh, hai internets. It’s been a while. Did you miss me?

Let me tell you what I’ve been up to.

In the fall of 2012 I shut down my consulting practice (Quality Tree Software) and my studio (Agilistry), and took a job with Pivotal. Actually, to be precise, I joined Pivotal Labs; Pivotal did not even exist in 2012.

Pivotal came into existence in April 2013 as a spin out from EMC and VMWare. Labs is part of Pivotal, but we are more of a product company focused on cloud and data than a services company. We work on Cloud Foundry, an open source Platform-as-a-Service (PaaS), and have our own distribution as well as our hosted service. The other side of our business is data. Our Big Data Suite includes GPDB (an MPP database), HAWQ (SQL on Hadoop), and Gemfire (an in-memory data grid).

My role at Pivotal has evolved in the time I’ve been there.

For the first couple years, I was the Director of Quality Engineering on Cloud Foundry. It’s a title I swore I would never take again. But my job was different than you might imagine. I did not direct the efforts of quality engineers. Rather, I paid attention to our feedback cycles. Teams own their tests, their CI pipelines, and ultimately the quality of their deliverables. I just helped connect the dots. By the way, if you want to know more about quality and testing on Cloud Foundry, I did get out of the building long enough to give a talk on it at the Wikimedia Foundation. I also gave a talk on the Care and Feeding of Feedback cycles at DOES2014.

In the last few months I moved over to our Data organization in Palo Alto. This changed my commute substantially, so my family and I are moving this summer. That will be an adventure. We’ve been in the same house for 17 years. So wish me luck with that.

Along with the move to our data org, my title changed. We removed the word “quality” from it since what I do does not look anything like traditional quality engineering. So I’m now a director of engineering. But the work I do on a daily basis with our Data teams looks a lot like what I did with Cloud Foundry: I’m deeply involved in hiring, cross-team coordination, improving our release practices, improving builds and CI to make the developer workflow better, and coordinating with our product organization to make sure teams have a steady stream of high value work.

I’m also doing my best to climb the steep learning curve of MPP databases and Hadoop. It helps that I worked at Sybase once upon a time. But that was 20 years ago. So between the fact that I was doing very different work 20 years ago, I’ve forgotten much of what I learned, and things have changed a bit in 2 decades, my prior database experience is only helping me a little in understanding my new context.

I have to say that I love working at Pivotal. I adore the people, am fascinated by the products, and am passionate about the way we work. Coming back to Pivotal was like coming home. (After all, Pivotal Labs is where I learned Agile over a decade ago.)

Some of you have noted that I don’t get out much anymore. I’m not at conferences and I don’t travel much. Since I’m in an inward facing role it’s difficult for me to carve out time to get out into the community. I’d like to see my industry friends more often and I am always honored to be invited to speak. But I turn down the vast majority of speaking invitations. My job takes up all my available time and brain cells.

So that’s what I’m up to and why I’ve been silent here for so long. I do have things to say though. I’ve learned a lot in the last 30 months. And I’m learning more every day. So I hope to carve out time to share what I’m learning here. But no promises about when, exactly, I’ll post.