Agile Adjustments: a WordCount Story

I originally wrote this for the AYE website in 2007. It’s no longer published there so I’m posting it here. Despite itching to tweak some words and add a better conclusion, I resisted the temptation to edit it other than formatting it for this blog. It’s as I wrote it in 2007. (Despite being 4 years old, I think this post is still relevant…perhaps even more so today with Agile having crossed the chasm.)

We were in the middle of my Agile Testing class, and the simulation had run for two rounds so far. Some of the participants created “software” on index cards. Others tested it. Still others deployed it. The participants were wholly engaged in their work for the fictitious “Word Count, Inc.” As the facilitator, I was running the simulation in 15 minute rounds followed by 15 minute reflect-and-adjust mini-retrospectives.

After the second round, during the mini-retrospective, I asked, “What do you see happening?”

“The deployment team looked like they were twiddling their thumbs for most of the round,” one participant observed.

Another participant added, “I think that’s because most of the cards are still on the QA table,” she said. “QA is a bottleneck.”

“No, the problem is that development didn’t deliver anything until the very last minute.” objected one of the QA team members.

“Well that’s because it took us most of the last round to coordinate with the deployment team,” one of the Developers countered.

“Your cards were all mixed up when you delivered them. We sent them back so you could sort them out. That’s hardly a ‘coordination’ problem.” scowled a Deployment team member.

Mixed up source code, software stuck in QA, late deliverables. Sounded like a real world project to me.

I shifted the conversation: “What would you like to change to improve the outcome in the next iteration?”

The answers varied: “Hold more project meetings to coordinate efforts!” “Appoint a project manager to keep everything on track!” “More people in QA!” “Define a source code control process!” The suggestions may all have been different, but there was a general trend: the participants wanted to add control points, process steps, and personnel in an attempt to reduce the chaos.

For the next round, the team adopted new practices: adding a new role of project manager; adding more meetings; and adding a strict change control process. During the next round I observed the team use half their available time standing in a big group discussing how to proceed. It seemed to me that in their attempt to control the chaos, they created a process in which it was almost impossible to get anything done. Once again, they weren’t able to deploy an updated version. And at the end of the round, the project manager quit the role in disgust and went back to “coding” on cards.

The team meant well when they added the role of project manager, and added more meetings, but their strategy backfired.

Most groups that go through the WordCount, Inc. simulation encounter problems similar to the ones that this team encountered. Some react by attempting to introduce the same kinds of controls as this group, with similar results. But some respond differently.

One group responded to the mixed-up-source-code problem by creating a centralized code repository that was visible and shared by all. Instead of creating a change control process to manage the multiple copies of the source code floating around, they posted one copy to be shared by all in a central location: the paper equivalent of source control.

Another group responded to coordination and bottleneck problems by co-locating teams. Instead of holding meetings, they coordinated efforts by working together.

Yet another group established an “automated” regression test suite that the deployment team always ran prior to each deployment. They then posted the test results on a Big Visible Chart so everyone knew the current state of the deployed system.

These steps all had the effect of making the team more Agile by increasing visibility, increasing feedback, improving collaboration, and increasing communication. And the end result for each group was success.

When reflecting-and-adjusting, it’s easy to reach for command-and-control solutions, to add quality gates and checkpoints and formal processes. But the irony is that such process changes often increase the level of chaos rather than reducing it. They introduce delays and bloat the process without solving the core problem.

It happens in the real world too.

One organization struggling with buggy code decided to create a role of Code Czar. Before any code could be checked into the source control system, it had to go through the Code Czar who would walk through the proposed changes with the programmer. The Code Czar role required someone very senior. Someone with tremendous experience with the large, complex code base under development. Someone who was also very, very busy. The result: code checkins were delayed whenever the Code Czar was unavailable. Worse, despite having more experience than anyone else on the team, the Code Czar couldn’t always tell what effect a given set of changes might have. The delays in checkins weren’t worth it; they did not result in an overall improvement in code quality.

By contrast, many teams find that automated unit tests work far better as a code quality feedback mechanism than a designated human code reviewer. Instead of waiting for a very busy person to become available, programmers can find out for themselves in minutes if their latest changes will have undesired side effects.

Even Agile teams that regularly reflect-and-adapt in iteration retrospectives are not immune to the temptation to revert to command-and-control practices. For example, Agile teams struggling to test everything during an iteration sometimes create a formal testing phase outside the iteration. I even heard of one organization that was struggling with completing all the tasks in an iteration attempt to solve the problem by having their Scrum Master do a Work Breakdown Structure (WBS) and delegate tasks to specific team members. Not surprisingly, both solutions caused more problems than they solved.

So how can you tell if a given process change will actually be an improvement and make a team more Agile? Before implementing a process change, consider how (or if) the proposed change supports Agile values like visibility, feedback, communication, collaboration, efficiency, and rapid and frequent deliveries. Also ask yourself these questions:

Does the process change rely on humans achieving perfection? To succeed in the role, the Code Czar would have had to have perfect knowledge of all the interdependencies in the code. Similarly, some processes rely on having perfect requirements up front. Successful practices don’t rely on perfect knowledge or perfect work products. Instead, they rely on fast feedback and visibility to enable the team to detect problems early, correct them while they’re small, and enable the team to improve iteratively.

Does it result in more time talking than working? Beware any process improvement that involves more meetings. More meetings rarely solve either communication or coordination problems. As the project manager in the simulation discovered, talking about work doesn’t increase the amount of work actually accomplished. As an alternative to meetings, consider collaborative working sessions where team members do the work rather than talking about it.

Does it introduce unnecessary delays or false dependencies? Whenever a process change increases the number of formal hand-offs, it slows things down but may not improve the overall outcome. The Code Czar learned this the hard way.

Comments { 2 }

What Software Has in Common with Schrödinger’s Cat

In 1935, physicist Erwin Schrödinger proposed a thought experiment to explain how quantum mechanics deals only with probabilities rather than objective reality.

He outlined a scenario in which a cat is placed inside a sealed chamber. Inside the chamber is a flask containing a deadly substance. There is a small bit of radioactive material that has a 50% chance of decaying within a specified time period, say an hour.

If the radioactive material decays, a hammer breaks the flask and the cat dies. If it does not decay, the contents of the flask are flushed safely away and the cat lives.

(This would be a barbaric experiment if it were real, but remember that this is only a thought experiment. No actual cats were harmed.)

If we were to leave the apparatus alone for a full hour, there is an equal probability that the cat lived or died.

Schrödinger explained that in the moment before we look inside the box to discover the outcome, the cat is both alive and dead. There is no objectively measurable resolution to the experiment…yet. The system exists in both states. Once we peek (or by any other means determine the fate of the kitty), the probability wave collapses.

When I first read of Schrödinger’s Cat in my physics class, I was befuddled. A cat is alive, or dead, not both. I did not understand the idea of a probability wave that contained both possible states.

So I can understand completely if you are thinking, “Look, the dang cat is dead. Or not. And besides, this is not related to software AT ALL.”

Ah, but it is.

You see, in the moment we release software, before users* see it, the system exhibits the same properties as Schrödinger’s feline.

There is some probability that we have done well and our users will be delighted. There is another possibility: we may have missed the mark and released something that they hate. (Actually there are an infinite number of possibilities involving various constituents with varying degrees of love and hate.)

Until the actual users start using the software, the probability wave does not collapse. We do not know, cannot tell, the outcome.

For teams that believe they are building awesome stuff, the moment before users get their hands on our work is a magical time full of excitement and wonderment.

For teams that believe they are building a pile of bits not suitable for human usage, it is a time of fear and panic.

But both fear and excitement stem not from observable reality but rather from speculation.

We are speculating that the bugs that we know about and have chosen not to fix are actually as unimportant to our users as they are to us.

We are speculating that the fact we have not found any serious defects is because they don’t exist and not because we simply stopped looking.

We are speculating that we knew what the users actually wanted in the first place.

We are speculating that the tests we decided not to run wouldn’t have found anything interesting.

We are speculating that the tests we did run told us something useful.

None of it is real until it is in the hands of actual users. I don’t mean someone who will poke at it a bit or evaluate it. And I don’t mean a proxy who will tell you if the users might like it. I mean someone who will use it for its intended purpose as part of their normal routine. The experience those users report is reality. Everything else is speculation.

This is what teams forget in that heady moment just before release. They experience all their excitement or terror, confidence or insecurity, as real. We forget that reality is meta-surprising: it surprises us in surprising ways.

And this is why Agile teams ship so often.

It’s not because Agile is about going faster. It’s because structuring our work so that we can ship a smaller set of capabilities sooner means that we can collapse that probability wave more often. We can avoid living in the land of speculation, fooling ourselves into thinking that the release is alive (or dead) based on belief rather than fact.

In short, frequent delivery means we live in reality, not probability.

Facing reality every day is hard. Ignorance is bliss, they say. But living in the land of comforting illusions and declared success is only blissful as long as the illusion lasts. Once the illusion is shattered, the resulting pain escalates with the length of time spent believing in a fantasy and the degree of discrepancy between our beliefs and the actual results. Given sufficient delusion and lengthy schedules, the fall to Earth can be downright excruciating.

I’ll take small doses of harsh reality over comforting illusions and the inevitable ultimate agony any day.

* I use the term “users” here to represent both users (the people who use the software) and customers (the people who decide to buy the software).

If you are buying yourself a game to play, you are both the user and the customer. In sufficiently enterprisey systems, the customer might never even see the software. In that situation the customer and users have very different concerns, so it’s a more complicated probability wave. After all, if the customers love it but the users hate it, was it a success or failure? I’ll leave that discussion as an exercise for the reader.

Comments { 14 }

2nd Annual QA/Test Job Posting Study

This is a guest blog post by Daniel Frank, my assistant. Daniel took on the challenge of updating the QA/Test job study for 2011, just in time for making New Year’s resolutions. Enjoy! Elisabeth

It’s been a little over a year since Elisabeth published “Do Testers Have to Write Code,” the results of an in-depth survey of job ads that she and Melinda conducted to see if employers expect testers to program. The resounding conclusion, with 80% of tester job ads requesting some kind of programming skill, was “Yes.”

This year we wanted to see if things have changed, so I conducted the same study again. I also wanted to add a bit more granularity to the study, to see if there were any trends that were missed last time.

I screened the lists with the same basic guidelines as our previous study. That means I restricted my search to the US only. I only counted a job if it was described as a testing/QA position in the job title. I did not include recruiter listings in order to avoid the risk of including duplicate jobs or even fake jobs used to gather pools of applicants.

Our final sample size this year is 164 jobs. That’s a little less than last year. Why?

The lists were sparse. There just aren’t that many job ads out there. Many of the job ads I found were from recruiters or were repeats, with the same company listing the same position several weeks in a row.

The simple fact that I had a hard time finding the same number of ads as last year is interesting information all on its own. From an overall economic standpoint, the country is in no more of a slump than we were in 2010. So why are there fewer listings for testers? Could it be that Alberto Savoia, who recently declared testing dead, is correct? We’ll come back to that question later.

Back to the study…

Like last year, the majority of our jobs came from Craigslist (90) and LinkedIn (64). The rest of them came from a smattering of other sites.

The data includes an even higher proportion of jobs in California than last year: 102 of the listings were in CA, with the remainder divided in small chunks between 28 other states. Unsurprisingly,Texas, Massachusetts, and Washington are the three runners up.

Last year there was some question of whether or not the sample was biased simply because we’re located in California. However, I took extra steps to try and get equal representation. The simple fact is that a search that might find 70 jobs when I filter the location for CA will result in 30 jobs or fewer if I filter for another area. If anything, I’d estimate that California is actually under represented.

I kept track of the job titles. By far the most popular title is “QA Engineer” (99 of the listings). 136 of the titles contained “QA” compared with only 32 containing the word “Test.”

An interesting side note: when I searched for the word “test” in the body of job ads, I found far more developer positions than similar searches for “qa” did. It would seem that at the same time QA/Test positions are requiring more coding skills, developer positions are requiring more testing skills. That might be another interesting job ad survey project.

So how much coding are testers expected to do?

Of the 164 listings, 102 jobs say they require knowledge of at least one programming language, and 38 jobs indicate coding is a nice to have. That’s 140 out of 164, or 85.37% of the sample. That’s an even higher percentage than last year. It’s difficult to say if the 5% uptick represents a real increase in demand, but at the very least it’s fair to say that demand for testers who code remains high.

I used the same criteria that Elisabeth and Melinda used last year. That means that I counted a job as requiring programming if the job required experience in or knowledge of a specific language, or if the job duties mentioned a language. There were 7 jobs which listed broad experience requirements like “must be able to script in multiple languages,” which also counted as requiring programming.

There were some judgment calls to be made about what may or may not count as a programming language. For the purpose of the results here, I counted SQL or other relational database knowledge as a programming language in order to be consistent with last year. However, unlike last year, I tracked proficiency in relational databases separately. This will let me track specific trends more easily in future studies.

One of the questions Elisabeth wanted to answer last year was whether jobs with self-identified Agile organizations required testers to code more than other jobs. This year 46 of of the 58 Agile job ads list programming skills as required or nice to have. That’s 79.31%, which is actually a lot less than last year’s 90%. However, this is one of those places where the small sample size has to be taken into consideration. In 2010, 49 out 55 agile jobs mentioned programming. Today, 46 out of 58 jobs mention it. Just a few jobs result in a 10% variation.

An enduring question about any kind of job is how much it pays. I saw even less mentions of pay this time around. Only 7 jobs even listed it, and 5 of those were button-pushing game testing positions in the $10-$20/hour range. The other two ran around $85,000-$105,000. Most positions simply don’t provide up front salary estimations, so we cannot draw any real conclusions from these data points.

Just for fun, I also noted whenever a job requested a certification. In 164 jobs I found exactly 4 mentions of certification, and not a single one was required. 3 of them were vendor or technology certifications that had nothing to do with testing. And even in the single instance where a testing certification was nice to have, it was the CSTE offered by QAI, rather than the much more hyped ISTQB. So it would seem that testing certifications are not much in demand. The bottom line is that someone looking to improve their marketability would be much better served by upskilling to a new proficiency rather than picking up an irrelevant certification.

And that’s about it for our study. If you’d like to dig through the raw data to look for any trends I may have missed, I’ll be happy to send it to you. Drop me a line.

Now back to the question about the number of QA/Test jobs out there. Could it be that there are fewer QA/Test positions? Was this just a matter of luck and timing, or is there a trend?

Alberto Savoia gave a talk titled “Test is Dead” at GTAC (dressed as the Grim Reaper). He may have used intentionally inflammatory hyperbole to make his point, but that doesn’t change the fact that he had interesting points to make.

Alberto points out that especially in web development, speed is paramount. Further, the biggest challenge isn’t in building “it” right, but in building the right “it.” So the goal is to get a minimum viable product out as quickly as possible, and get fast feedback from real users and customers. Traditional black box testing ends up taking a back seat in this type of development, and these projects often rely heavily on user feedback instead.

At STARWest 2011, James Whittaker of Google gave a talk titled “All That Testing is Getting in the Way of Quality” where he talked about the closest thing to a traditional testing role they have at Google. It’s called the “Test Engineer,” and they spend anywhere from 20%-80% of their time writing code. He also explains how Google utilizes their user bases to do almost all of their exploratory tests. As he puts it, “Users are better at being users than testers are, by definition.”

With James and Alberto’s talks firmly in mind, I can’t help but wonder if the difficulty I experienced in finding job ads that met my criteria is indicative of a sea-change in the industry rather than an anomaly. Could it be that we’re seeing a reduction in the number of QA/Test positions?

What do you think? Are you seeing fewer QA/Test positions in your organization or (if you’re looking) in your job search?

Comments { 8 }

From the mailbox: selecting test automation tools

A long time ago, all the way back in 1999, I wrote an article on selecting GUI test automation tools. Someone recently found it and wrote me an email to ask about getting help with evaluating tools. I decided my response might be useful for other people trying to choose tools, so I turned it into a blog post.

By the way, so much has changed since my article on GUI testing tools was published back in 1999 that my approach is a little different these days. There are so many options available now that weren’t 12 years ago, and there are new options appearing nearly every day it seems.

Back in 1999 I advocated a heavy-weight evaluation process. I helped companies evaluate commercial tools, and at the time it made sense to spend lots of time and money on the evaluation process. The cost of making a mistake in tool selection was too high.

After all, once we chose a tool we would have to pay for it, and that licensing fee became a sunk cost. Further, the cost of switching between tools was exorbitant. Tests were tool-specific and could not move from one tool to another. Thus we’d have to throw away anything we created in Tool A if we later decided to adopt Tool B. Further, any new tool would cost even more money in licensing fees. So spending a month evaluating tools before making a 6-figure investment made sense.

But now the market has changed. Open source tools are surpassing commercial tools, so the license fee is less of an issue. There are still commercial tools, but I always recommend looking at the open source tools first to see if there’s anything that fits before diving into commercial tool evaluations.

So here’s my quick and dirty guide to test tool selection.

If you want a tool to do functional test automation (as opposed to unit testing), you will probably need both a framework and a driver.

  • The framework is responsible for defining the format of the tests, making the connection between the tests and test automation code, executing the tests, and reporting results.
  • The driver is responsible for manipulating the interface.

So, for example, on my side project entaggle.com, I use Cucumber (framework) with Capybara (driver).

To decide what combination of framework(s) and driver(s) are right for your context…

Step 1. Identify possible frameworks…

Consideration #1: Test Format

The first thing to consider is if you need a framework that supports expressing tests in a natural language (e.g. English), or in code.

This is a question for the whole team, not just the testers or programmers. Everyone on the project must be able to at least read the functional tests. Done well, the tests can become executable requirements. So the functional testing framework needs to support test formats that work for collaboration across the whole team.

Instead of assuming what the various stakeholders want to see, ask them.

In particular, if you are contemplating expressing tests in code, make very sure to ask the business stakeholders how they feel about that. And I don’t mean ask them like, “Hey, you don’t mind the occasional semi-colon, right? It’s no big deal, right? I mean, you’re SMART ENOUGH to read CODE, right?” That kind of questioning backs the business stakeholders into a corner. They might say, “OK,” but it’s only because they’ve been bullied.

I mean mock up some samples and ask like this: “Hey, here’s an example of some tests for our system written in a framework we’re considering using. Can you read this? What do you think it’s testing?” If they are comfortable with the tests, the format is probably going to work. If not, consider other frameworks.

Note that the reason that it’s useful to express expectations in English isn’t to dumb down the tests. This isn’t about making it possible for non-technical people to do all the automation.

Even with frameworks that express tests in natural language, There is still programming involved. Test automation is still inherently about programming.

But by separating the essence of the tests from the test support code, we’re able to separate the concerns in a way that makes it easier to collaborate on the tests, and further the tests become more maintainable and reusable.

When I explain all that, people sometimes ask me, “OK, that’s fine, but what’s the EASIEST test automation tool to learn?” Usually they’re thinking that “easy” is synonymous with “record and playback.”

Such kinds of easy paths may look inviting, but it’s a trap leads into a deep dark swamp from which there may be no escape. None of the tools I’ve talked about do record and playback. Yes, there is a Selenium recorder. I do not recommend using it except as a way to learn.

So natural language tests facilitate collaboration. But I’ve seen organizations write acceptance tests in Java with JUnit using Selenium as the driver and still get a high degree of collaboration. The important thing is the collaboration, not the test format.

In fact, there are advantages to expressing tests in code.

Using the same unit testing framework for the functional tests and the code-facing tests removes one layer of abstraction. That can reduce the complexity of the tests and make it easier for the technical folks to create and update the tests.

But the times I have seen this work well for the organization is when the business people were all technology savvy so they were able to read the tests just fine even when expressed in Java rather than English.

Consideration #2: Programming Language

The next consideration is the production code language.

If your production code is written in… And you want to express expectations in natural language, consider… Or you want to express expectations in code, consider…
Java Robot Framework, JBehave, Fitnesse, Concordion JUnit, TestNG
Ruby Cucumber Test::Unit, RSpec
.NET Specflow NUnit

 

By the way, the tools I’ve mentioned so far are not even remotely close to a comprehensive list. There are lots more tools listed on the AA-FTT spreadsheet. (The AA-FTT is the Agile Alliance Functional Testing Tools group. It’s a program of the Agile Alliance. The spreadsheet came out of work that the AA-FTT community did. If you need help interpreting the spreadsheet, you can ask questions about it on the AA-FTT mail list.)

So, why consider the language that the production code is written in? I advocate choosing a tool that will allow you to write the test automation code in the same language (or at least one of the same languages if there are several) as the production code for a number of reasons:

  1. The programmers will already know the language. This is a huge boon for getting the programmers to collaborate on functional test automation.
  2. It’s probably a real programming language with a real IDE that supports automated refactoring and other kinds of good programming groovy-ness. It’s critical to treat test automation code with the same level of care as production code. Test automation code should be well factored to increase maintainability, remove duplication, and exhibit SOLID principles.
  3. It increases the probability that you’ll be able to bypass the GUI for setting up conditions and data. You may even be able to leverage test helper code from the unit tests. For example, on entaggle.com, I have some data generation code that is shared between the unit tests and the acceptance tests. Such reuse drastically cuts down on the cost of creating and maintaining automated tests.

Consideration #3: The Ecosystem

Finally, as you are considering frameworks, consider also the ecosystem in which that framework will live. I personally dismiss any test framework that does not play nicely with both the source control system and the automated build process or continuous integration server. That means at a bare minimum:

  • All assets must be flat files, no binaries. So no assets stored in databases, and no XLS spreadsheets (though comma separated values or .CSV files can be OK). In short, if you can’t read all the assets in a plain old text editor like Notepad, you’re going to run into problems with versioning.
  • It can execute from a command line and return an exit code of 0 if everything passes or some other number if there’s a failure. (You may need more than this to kick off the tests from the automated build and report results, but the exit code criteria is absolutely critical.)

 

Step 2. Choose your driver(s)…

A driver is just a library that knows how to manipulate the interface you’re testing against. You may actually need more than one driver depending on the interfaces in the system you’re testing. You might need one driver to handle web stuff while another driver can manipulate Windows apps.

Note that the awesome thing about the way test tools work these days is that you can use multiple drivers with any given functional testing framework. In fact, you can use multiple drivers all in a single test. Or you can have a test that executes against multiple interfaces. Not a copy of the test, but actually the same test. By separating concerns, separating the framework from the driver, we make it possible for tests to be completely driver agnostic.

Choosing drivers is often a matter of just finding the most popular driver for your particular technical context. It’s hard for me to offer advice on which drivers are good because there are so many more drivers available than I know about. Most of the work I do these days is web-based. So I use Selenium / WebDriver.

To find a specific driver for a specific kind of interface, look at the tools spreadsheet or ask on the AA-FTT mail list.

Step 3. Experiment

Don’t worry about choosing The One Right tool. Choose something that fits your basic criteria and see how it works in practice. These days it’s so much less costly to experiment and see how things go working with the tool on real stuff than to do an extensive tool evaluation.

How can this possibly be? First, lots of organizations are figuring out that the licensing costs are no longer an issue. Open source tools rule. Better yet, if you go with a tool that lets you express tests in natural language it’s really not that hard to convert tests from one framework to another. I converted a small set of Robot Framework tests to Cucumber and it took me almost no time to convert the tests themselves. The formats were remarkably similar. The test automation code took a little longer, but there was less of it.

Given that the cost of making a mistake on tool choice is so low, I recommend experimenting freely. Try a tool for a couple weeks on real tests for your real project. If it works well for the team, awesome. If not, try a different one.

But whatever you do, don’t spend a month (or more) in meetings speculating about what tools will work. Just pick something to start with so you can try and see right away. (As you all know, empirical evidence trumps speculation. :-))

Eventually, if you are in a larger organization, you might find that a proliferation of testing frameworks becomes a problem. It may be necessary to reduce the number of technologies that have to be supported and make reporting consistent across teams.

But beware premature standardization. Back in 1999, choosing a single tool gave large organizations an economy of scale. They could negotiate better deals on licenses and run everyone through the same training classes. Such economies of scale are evaporating in the open source world where license deals are irrelevant and training is much more likely to be informal and community-based.

So even in a large organization I advocate experimenting extensively before standardizing.

Also, it’s worth noting that while I can see a need to standardize on a testing framework, I see much less need to standardize on drivers. So be careful about what aspects of the test automation ecosystem you standardize on.

Good luck and happy automating…

Comments { 14 }

Checking Alignment, Redux

I’ve been writing a lot lately. Writing for long stretches leaves me mentally drained, nearly useless. The words dry up. I stop making sense. I find it increasingly difficult to form coherent sentences that concisely convey my meaning. Eventually I can’t even talk intelligibly.

I recall attending a party after a week of solid writing a few years ago.

“How are you?” my host asked when I arrived.

“Unh.” I muttered. “Good.”

“What have you been up to?” she inquired.

“Um. Writing.” I stopped talking and stared back at her expectantly.

I wanted to be social, but no more words would come. I stood there just staring at her. It didn’t even occur to me to ask how she was doing or what she was up to.

My host looked at me sideways, unsure how to respond to my blank stare. It wasn’t a Halloween party, and yet I was doing a passable impression of a zombie. How does one respond to zombified guests?

Anyway, my point is that I’m in one of those states now. And thus I may have great difficulty making myself understood. Producing words that fit together to express ideas is becoming increasingly difficult.

I’m guessing this is why I failed to explain myself well in my last post. Or at least I am inferring from the response to that last post that there is a gap between what I intended to say and what most people understood me to be saying.

I had three points that I wanted to make in my last post:

  1. It’s easy to speculate about the connection between actual needs, intentions, and implementation.
  2. Empirical evidence trumps speculation. Every single time.
  3. Testers are NOT the only people who gather that empirical evidence.

Given that’s what I meant to say, I certainly didn’t expect UTest, a testing services company, to like the post so much that they would tweet:

We couldn’t agree more! It’s all about the testing!

Yes, it is all about the testing. But—and this is a crucial BUT—it is not all about the testers.

In fact, much of the kind of testing that goes into ensuring alignment between intentions/implementation and actual need is something that testers have very little to do with, and it’s something that cannot ever be outsourced to a testing services company.

Let’s look at the sides of the triangle of alignment again:

Actual Need: the value our users and/or customers want.

Intentions: the solution we intend to deliver in order to serve the Actual Need. The product owner, product manager, business analyst, or designer is the one who typically sets the intentions. It’s their job to listen to the cacophony of conflicting requests and demands and suggestions in order to distill a clear product vision. For now let’s just call this person the product owner. They own the product vision and decide what gets built.

Implementation: the solution the team actually delivers.

So who makes sure that the intentions and implementation match the actual needs?

The best person to do this is usually the person who set the intentions in the first place: the product owner. They’re supposed to be steering the project.

If the product owner has no way of verifying that they asked for the right thing and can’t tell whether or not the resulting software delivers the expected value, the project is doomed.

Seriously, I’ve lived through this as a team member and also seen it from the sidelines. The person responsible for setting the intentions needs a way to tell whether the actual needs are being met. They need feedback on the extent to which the intentions they set for the team pointed us in the right direction. Otherwise we end up in a painful cycle of requirements churn that can ultimately end in organizational implosion if we hit the end of the runway before we deliver real value.

Michael Bolton’s story of getting out of the building and picking up sample checks on his lunch hour is fabulous. But to me, it’s not a story about testing. Rather it’s a great story about how having multiple examples are key to truly understanding requirements.

Further, I’ll suggest that in this story Michael was acting as a Team Member rather than a Tester. The fact that Michael is a world class tester is not the most salient part of the story. The important thing is that he noticed the team needed something and he went out of his way to get it.

It is important not to confuse Michael’s initiative as a team member with an exclusive job responsibility of testers. Michael took the initiative. That’s one of the reasons why he is a world class tester. But picking up that sample check is something that a programmer could have done. Or the product owner. Everyone on a project can contribute to establishing a shared understanding of the full scope of the requirements. And everyone has a hand in gathering empirical evidence, not just testers.

Testers happen to be really good at gathering information. Teams need testers. But teams also need the testing mindset to be baked into the culture. Team members need to ask these key questions before taking action:

  • How will I know my efforts had the effect I intended?
  • How will I know my intentions were correct?
  • How will I know my results are delivering real value?

These questions are at the core of the test-first mindset. And the answer to these questions is never, “I’ll just ask the testers.”

Comments { 4 }

Checking Alignment

Let’s start at the beginning. Somebody, somewhere, needs some software.

Maybe we’re serving an internal “customer” who needs a simple bailing-wire-and-duct-tape app to connect system A with completely unrelated (except that they need to be able to share data) system B. Or maybe we’re in a startup that’s trying to Change the World with a grand vision, or perhaps a modest vision, to give people software that makes their lives better.

Either way, we build software because there are people who need it. Let’s call what users need the Actual Need. We want to serve that Actual Need by building a truly kick butt solution.

On Agile teams we use user stories in an attempt to capture actual needs. For example:

As a Banking Customer I want to use my ATM card to withdraw money from an automated banking machine while I’m in Kiev so that I can buy a cup of fabulous local coffee with the local currency, Hryvnia.

This is way better than “The system shall accept a rectangular piece of plastic made to conform with standard …” It humanizes the problem space and puts the user at the forefront.

But user stories aren’t typically written by real users. They’re written by surrogates: Business Analysts or Product Managers or the like. These internal people go out and find the Actual Need and then set Intentions for what we need to build to meet those needs.

And then the Software Development team brings the intentions to life with the Implementation.

That’s software development in a nutshell from gathering requirements through deployment. It’s all about finding the happy place where we are addressing real needs with awesome solutions.

So here’s the big question:

How do we know that our Intentions matched the Actual Need, that the Implementation matched our Intentions, and ultimately that the Implementation matched the Actual Need?

Three Sides of Alignment

If software development projects exhibited mathematical properties then we could count on the relationship between each of these three things being symmetrical. That is, if A = B, and B = C, then mathematically speaking, A must also equal C.

But that doesn’t work with software.

We can set Intentions that, if implemented, would match the Actual Need. And we can communicate those Intentions effectively so that we end up with an Implementation that does what we intended it to do. But that does not mean that users won’t experience any problems with the delivered solution.

As an aside, this is fundamentally why waterfall does not work. Even if we could build the perfect requirements document that perfectly captured the actual needs of the business or our user base, there is no way to ensure that the resulting implementation will be a success. And by the time we release it’s way too late.

Back to my assertion that alignment is not symmetrical in software systems. Consider my story of trying to get Hryvnias in Kiev.

So there I was in Kiev with only a few Hryvnia in my pocket. I needed cash. So I took my trusty ATM card to an AutoBank machine. Actually, I walked by any number of AutoBank machines looking for one that had the right symbols on the front so I could be sure my card would work, and that was built into the wall of a bank so that it wasn’t a fake ATM machines designed to skim info. Yes, I can be paranoid sometimes. So anyway, I think I marched Sarah halfway across the city before I found one I would stick my card into.

Having finally found what I deemed to be a trustworthy ATM, I put in my card and entered my pin. And then I got a scary looking message: “You entered your PIN number incorrectly 3 times.” And the machine ate my card.

Here we see an example of how alignment is not symmetric. The Implementation of the ATM software no doubt matched the Intentions of those who built it. ATM machines are a mature and robust technology after all. And the Intentions addressed the Actual Need. Again, ATM machines are a known quantity at this point. But the Implementation spat in the face of my Actual Need. I was thwarted. Not only was my need for cash not met, but now my card was gone. And I had not entered my PIN number 3 times; I just entered it once. (My guess is that the real issue had to do with whether or not the ATM machine supported my card and not with the actual PIN number.)

But I digress. Back to the point.

We have Actual Need, Intentions, and Implementation. How do we know that all three of these things are in alignment?

The product owner can speculate that the Intentions accurately describe the Actual Need. If the product owner is hands off we often see development teams unilaterally asserting that the Implementation matches the Intentions. Worse, some product owners remain hands off because they want plausible deniability when things go wrong. That’s just…ew. Throughout all this, the business stakeholders can assume that by doing what we set out to do, the Implementation will meet the Actual Need and we’ll all be rich.

And we will be fooling ourselves. Such guesses and speculation allow us to become wrapped up in the illusion of progress.

If we want to know whether our Intentions are in alignment with the Actual Need, Steve Blank would say that we have to get out of our cubes and talk to potential users or customers.

If we want be sure our Implementation matches our Intentions, we have to state those Intentions concretely, with examples and explicit expectations. As long as we’re doing that we might as well go whole hog and do ATDD. It’s the best way I know to drive out ambiguity, clarify assumptions, and provide an automated safety net to alert us any time the Implementation strays from the Intentions. But automated checks aren’t enough. We also have to explore to discover risks and vulnerabilities that would jeopardize the spirit of the Intentions even if not the letter.

Finally, if we want to be sure our Implementation matches the Actual Need we have to watch customer behavior carefully. That means monitoring usage statistics, watching conversions, and generally listening to what the Lean Startup guys have to say on validated learnings.

All of these activities are aspects of testing. And while testers are still important, not everything that involves some aspect of testing should be done by people with QA or Test in their title.

Too often software teams take a narrow view of “testing.” They think (to paraphrase Stuart Taylor) that it’s about checking or breaking. They relegate it to the people with “QA” or “Test” in their title. Typically we only test whether the Implementation meets the Intentions. We speculate about the rest.

And then we’re surprised by failures in production, angry customers, and declining revenue.

The harsh fact is that empirical evidence trumps speculation. Every. Single. Time. And testing, real testing, is all about getting that empirical evidence. It’s not something testers can do alone. There are too many kinds of testing involved in ensuring that all three things are in alignment: Actual Need, Intentions, and Implementation.

And ultimately that’s why testing is a whole team responsibility.

Comments { 9 }

DevDays San Francisco: Plan B

I was surprised by two things this morning before I even managed to finish my first cup of coffee.

The first thing was that Joel Spolsky had a DevDays conference scheduled in San Francisco for October 12 and 13. Clearly I was not paying attention when I scheduled my Agile Testing class on October 11 – 13. (This was on the heels of finding out that the most excellent PNSQC conference in Portland, OR is October 10 – 12.) Whoopsie.

But the second surprising thing was that Joel canceled DevDays.

Reading through the comments on Joel’s blog post announcing the cancellation, it seems that there are some folks with non-refundable travel arrangements.

Now San Francisco is a lovely place to be stuck. And I can understand if folks are thinking 2 days at the Exploratorium is worth the price of plane fare.

But I have another option.

If you signed up for DevDays, I’ll honor the $499 DevDays price for my 3-day Agile Testing class with Dale Emery. That’s nearly 50% off the regular class rate.

Sure, I understand DevDays and my Agile Testing class don’t have the same agenda at all. I get that you had your heart set on learning about creating iOS apps, online education, JS Backbone, compilers, and major scalability. But you’ll learn something that’s equally valuable: how to make your whole development process go faster by avoiding the late cycle surprises and resulting churn.

And what DevDays and my class do have in common is domain experts speaking from experience with no PowerPoint and no product pitches.

Interested? You can read more about the class here, and you’ll see that we’ve already added a special ticket class for victims of the DevDays cancellation on the registration form.

Comments { 0 }

Testing is a Whole Team Activity

I talk to a lot of people in organizations that use some flavor of Agile. Almost all of them, even the teams that are succeeding wildly with Agile, struggle with testing. It’s easy to say that we test throughout the cycle. It’s harder to do it.

Some teams are really struggling with testing, and it’s affecting their ability to get stories all the way done in a timely manner and with a high enough level of quality. I hear comments like these frequently:

“We’re implementing stories up to the last minute, so we can never finish testing within the sprint.”

“We are trying to automate the acceptance tests, but we’re about to give up. The tests get out of sync with the code too easily. Nine times out of ten, when the build is ‘red’ it’s because the tests are wrong.”

“I’m afraid we’re missing bugs because we never have time to explore the system. But we’re too busy running the regression tests to take time out to explore.”

Using a variation on the 5 Why’s, I dig into the issue with these folks. What I’ve found is that there is one common unifying root cause at the heart of all these challenges:

There is an (unfortunate) belief that testers test, programmers code, and the separation of the two disciplines is important.

In some cases, people within the organization hold this belief explicitly. They subscribe to the notion that the only valid testing is that which is done by an independent tester. Just in case you happen to be among that group, let me dispel the programmers-can’t-test myth right now.

Programmers most certainly can test. Anyone who can wrap their heads around closures and patterns and good design is perfectly capable of wrapping their heads around risks and corner cases and test heuristics. For that matter, some of the best programmers I’ve worked with also turned out to be some of the best testers.

Perhaps your objection is a little different: “Sure, programmers can test,” you say. “But they can’t be objective about their own code. They could test someone else’s but not their own.”

Well, yes. Blindspots tend to perpetuate.

However, as both a tester and a programmer I can tell you that at least for me, time pressure is much more of an issue than inherent subjectivity.

When I feel time pressure, I rush. When I rush, I forget stuff. Later when I find bugs in production, it’s in the areas that I forgot about, in the places where I rushed. Just testing someone else’s code won’t address the problem that time pressure leads to rushing.

However, pairing can address both problems: subjectivity and rushing the job. Pairing with someone else while testing—say, for example, having a programmer pair with a tester—can both ensure we’re testing from multiple perspectives and also that we’re not unduly rushing through while failing to notice that the installer just erased the hard drive.

In other cases, however, the people I am talking to already buy into the idea that programmers can test.

“We don’t suffer from the belief that testers and programmers should be kept separate,” they object. “We believe programmers should test! And our programmers do test! But we still struggle with finishing the regression testing during a sprint.”

“If everyone on the team believes in programmers testing, why aren’t the programmers pitching in to run the manual regression tests?” I counter.

“Because they don’t have time…”

“…because they’re too busy writing new code that the testers won’t have time to test?”

“Um, yeah…”

“Right. You’re telling me testers test and programmers code.”

“Oh.”

So, back to our original problem: the team is struggling to complete testing within a sprint.

Throwing more testing bodies at the problem will not solve the issue. It will result in spending time to bring the new testers up to speed and to filter through large swaths of feedback that doesn’t actually help move the project forward.

Throwing a separate team of test automators at the problem might work as a temporary band-aid but it will end up being very inefficient and expensive in the long run. The separate team of test automators won’t be able to change the source code to improve testability so they will spend more time fighting the code than testing it.

The long term sustainable solution is both simple and brutally difficult: recognize that testing and quality are the responsibility of the whole team, not any given individual.

This is so much easier said than done. Sure, we can say “everyone is responsible for testing and quality.” But when it’s the end of the sprint and the product owner is pushing for more features, it takes an enormous amount of strength and courage to say, “We have undone testing tasks stacking up. Coding more features will not help. We need to focus on testing what we already have.”

For that matter, spending programmer time on making automated tests execute faster and more reliably might seem like pure indulgence in the face of project deadlines.

And internal process metrics that measure programmers and testers separately just exacerbate the problem. Any time programmers are measured on lines of code, checkins, or implemented story points, while testers are measured on defect counts and test cases executed, we’re going to have problems getting team members to see testing as a whole team responsibility.

But when we can get the team to see testing as part of developing a working solution, wonderful things can happen.

  • Our inventory of coded-but-not-tested stories dissipates as stories no longer languish in the “To Be Tested” column on the task board. We no longer have to deal with the carrying cost of stories that might or might not work as we intended.
  • Programmers executing manual regression tests are in a better position to see both opportunities to automate, and also opportunities to pare down duplication.
  • Testers and programmers can collaborate on creating test automation. The result will be significantly better automation than either testers or programmers would have written on their own, created much more efficiently.
  • As the level of regression test automation increases, testers have more time to do the higher value activity of exploratory testing.

Testing is an activity. Testers happen to be really good at it. We need testers on Agile teams. But if we want real agility, we need to see that completing testing as part of the sprint is the responsibility of the whole team, not just the testers.

And that means we have to do away with the barriers—whether beliefs or metrics or external pressure—that reinforce the “testers test, programmers code” divide.

If you or someone you know is struggling with the question of how to integrate testing throughout the lifecycle in Agile, please check out the Agile Testing Class I’m teaching with Dale Emery at Agilistry Studio on October 11 – 13.

Comments { 28 }

Exploratory Testing in an Agile Context Materials

I’m giving a session at Agile2011 in Salt Lake City at 9AM Wednesday on Exploratory Testing in an Agile Context. The session itself will be entirely hands on: we will explore a hand-held electronic game that I brought while discussing how ET and Agile fit together hand-in-glove. However, I did produce materials for the session: a PDF that’s almost a booklet. Thought you all might like to see it.

Comments { 7 }

Agile Up 3 Here

We held Agile Up 3 Here at Agilistry Studio last week. Nine people gathered from all around the world for our second week-long intensive. Our team consisted of Alan Cooper, Jim Dibble, Pat Maddox, Alex Bepple, Brendon Murphy, Dale Emery, Matt Barcomb, Dave Liebreich, and me. Once again, we were working on mrhomophone.com.
My insights from the week:

  1. Distilling down to the absolute core of the intent for a given period of time is harder than it sounds. It’s tempting to include little nice-to-haves in the stories. Even when implementing, it’s tempting to do a bit of polishing in unrelated areas.
  2. Perhaps the temptation to expand the scope of the deliverable beyond the bare bones isn’t all bad. It can enable us to kick things up a notch, deliver something that surpasses the merely functional to something that feels indulgent.
  3. Or perhaps the temptation is dangerous. To the extent that we allow the extraneous little bits in, we risk losing sight of the bigger and more important goals.
  4. Laser-focused pair partners help with the struggle to distinguish between kicking things up a notch, yak shaving, and losing focus.
  5. Explicit working agreements help create safety (as well as creating a tight-knit team with a strong shared culture).
  6. Shared in-jokes and language also create shared culture. (“System Testing!” Ha ha!) Note that unless you were here, you have absolutely no idea why “system testing” might be funny. Even if I explained the joke, you still probably wouldn’t think it was funny. It’s a “you had to be there” kind of thing. And that’s why shared in-jokes are powerful for creating tight-knit teams.
  7. Creating a sense of safety is critical for learning.
  8. Deciding whether or not to upgrade your infrastructure cannot be a unilateral decision. (On the other hand, I don’t think I made the wrong call; I just did it the wrong way and at the wrong time. If I were to do it all over again, I would open up the discussion with the group in advance. And assuming we decided to upgrade our technology stack, I would start earlier and with more help in the beginning.)
  9. Integrating a test effort is hard, even when all the programmers are test infected and the tester is highly competent.

On a more personal note…

  1. When I think I have an answer, I cling to it doggedly. And when I finally let go of something, I really let go of it.
  2. I am extremely fond of my yak and cannot bear the thought of losing it, even if it will be replaced soon. (Sorry Alex.)
  3. It’s possible for people who don’t live here to introduce me to new things in my own back yard. Go figure. (Thanks for introducing me to the crazy dive bar, Pat.)

So that’s AU3H in a nutshell. AU4H will (probably) take place in May 2012. Details in another 6 months or so.

Comments { 0 }