Originally published on stickyminds.com
I have a theory. I think that technical people are being insulated from important business matters. Perhaps it’s because management wants technical people to concentrate on technical concerns. Perhaps it’s because management isn’t convinced that technical people understand anything about business strategy. Whatever the reasons might be, I want to put my theory to the test.
Think about the project you’re working on now. Mentally examine your part of it, then expand your mind to encompass the entire project. Think about all the groups working on it and what they do. Think about the timeline. And think about the anticipated end result. Got all that firmly in the forefront of your consciousness? Good, because it’s quiz time. See if you can answer the following questions:
1. Why is this project important?
2. How does it contribute to the long term strategic direction of the company?
3. How will it improve the company’s competitive position in the near term?
4. How much will it cost?
5. What financial benefits (reduced costs, increased revenues, etc.) will the project bring?
6. Who will benefit most from a successful outcome to this project?
7. What will happen if the project fails?
8. What capabilities will the users find most compelling and why?
9. If users don’t use this software, what will they use instead?
10. How well would the majority of the technical contributors on the project answer these questions?
My final question is more personal: how did you feel when you were answering those questions?
If you felt confident that you understood both the marketplace in which your company operates and the business case for your project, you’re probably in the minority. More often I find that technical people (who themselves are usually quick to chastise management for technical cluelessness) are unaware of the business reasons behind project decisions.
No wonder many software defects boil down to miscommunications. It’s difficult to understand what you’re building if you don’t know why you’re doing it.
Let me give you an example of the importance of “why.”
Early in my career I was put in charge of document production at a small company. Eager to be precise, I provided precise measurements for the spacing of 3 holes along the left hand side of the 8.5″ x 11″ booklets. The day before the documents were due back, Barry, the manager at the document reproduction company, called me.
“Why do you need the holes in these documents?” he asked.
“Uh, so I can put them in 3-ring binders.” I replied, wondering where this conversation was going.
“That’s what I thought,” he said. “In that case, whoever did these measurements is smokin’ something.” I blushed furiously; glad he couldn’t see my expression over the phone. That would be me, I thought to myself. Barry continued, unaware of my embarrassment, “They won’t fit in a 3-ring binder at all. Should I just punch these for a 3-hole binder and ignore the measurements?”
“Yes,” I replied meekly. As I hung up the phone I realized that I’d made a silly mistake in trying to be so precise. My intent was to ensure that there could be no room for miscommunication. Instead, I’d almost caused a disaster. Fortunately, Barry had enough experience with documents that he guessed my intentions. Knowing intentions is the key to requirements. That’s the difference between “recorded requirements” and “real requirements.”
I often see requirements that read like my specifications for the three holes in the documents. They’re precisely worded, but there’s no hint about why the software should do what the requirements ask.
“The window shall be no more than 800×600 pixels in size.”
Lovely. Why?
Some of you may have recognized those numbers, a common monitor resolution. A window larger than that size will overwhelm the screen on a low-resolution monitor. But how much harder would it be to say, “The warehouse personnel use machines that operate at 800 x 600 resolution. All windows must fit on the screen without scrolling, so don’t make them larger than that size.”
If you are beginning to wonder if you know enough about the business, seek out the information you need. Read everything you can get your hands on: your company’s press release, financial data, competitor’s web sites, customer discussion forums, and consumer reviews. Ask your manager about the company’s strategic direction and how your current project contributes. Keep asking questions about the business until you really understand not only what you are being asked to do, but also why you are being asked to do it.
Then, if you find yourself in a situation like Barry, asked to do something that doesn’t make sense, you’ll have the knowledge to articulate your concerns. And in doing so, you might find that management is a little more open with technical people about their business concerns. Show them that you can learn about their world, and they’ll give you the information you need. Go ahead, challenge them to give you “the business.”
Originally published on stickyminds.com
Let’s peek in on a discussion in a bug triage meeting.
Tim, the marketing manager, is shaking his head. “That’s a high on the severity scale. It’s really bad, guys. You have to make it a high.”
Jordan, the development manager, is barely containing her frustration. Her eye is starting to twitch as she replies, “No, Tim. That’s not all that bad. It’s an inconvenience, I agree, but there’s an easy workaround.”
“Inconvenience?!?” Tim says a bit more loudly than he intended. “You call not being able to print an inconvenience?!? That’s a disaster!”
“Yes, I call not being able to print from one particular type of printer without installing an upgraded driver from the vendor’s website an inconvenience. The user just needs…”
“I know what the user needs,” Tim cut in. “The user needs to be able to print out of the box! You can fix this in our code, right?”
Jordan nods, “Yes, but we’d just be working around the vendor’s…”
“Then fix it.” Tim stood over Jordan, glaring.
“But it’s a medium at best!” Jordan objected. “The user isn’t losing any data, doesn’t have to reboot, isn’t crashing. They just have to update a driver.”
This argument could continue forever. I’ve seen many arguments like this go on and on. What’s really happening here? Why are Tim and Jordan about to be at each other’s throats?
Priority is Business; Severity is Technical
Tim is looking at business priority: “How important is it to the business that we fix the bug?” Jordan is looking at technical severity: “How nasty is the bug from a technical perspective?” These two questions sometimes arrive at the same answer: a high severity bug is often also high priority, but not always. Allow me to suggest some definitions.
Severity is levels:
* Critical: the software will not run
* High: unexpected fatal errors (includes crashes and data corruption)
* Medium: a feature is malfunctioning
* Low: a cosmetic issue
Now you see why Jordan was arguing that the Print bug was a medium: a feature was malfunctioning.
Priority levels:
* Now: drop everything and take care of it as soon as you see this (usually for blocking bugs)
* P1: fix before next build to test
* P2: fix before final release
* P3: we probably won’t get to these, but we want to track them anyway
And now you can see why Tim was so adamant that the issue was a high. From his perspective, it was a P1 matter.
They’re both right. It’s of medium severity, but P1 to fix.
Priority is Relative; Severity is Absolute
Further, the priority might change over time. Perhaps a bug initially deemed P1 becomes rated as P2 or even a P3 as the schedule draws closer to the release and as the test team finds even more heinous errors. Priority is a subjective evaluation of how important an issue is, given other tasks in the queue and the current schedule. It’s relative. It shifts over time. And it’s a business decision.
By contrast, severity is an absolute: it’s an assessment of the impact of the bug without regard to other work in the queue or the current schedule. The only reason severity should change is if we have new information that causes us to re-evaluate our assessment. If it was a high severity issue when I entered it, it’s still a high severity issue when it’s deferred to the next release. The severity hasn’t changed just because we’ve run out of time. The priority changed.
Priority and Severity Don’t Mix
In response to Johanna’s column last week, some people suggested using both severity and priority to come up with a composite risk number. While this intuitively sounds like a way to resolve the priority-severity divide, I suggest using such an approach with extreme caution. It’s multiplying apples by oranges in an attempt to quantify bananas. Risk is yet a third type of information.
The risk associated with any bug depends on the severity of the issue, certainly. But it also depends on the likelihood that the user will run into it as well as the possible losses that might occur. I don’t attempt to quantify all this when assessing the severity of an issue. In fact, I think that in most cases assessing the risk of a single issue takes more time than it’s worth. Only for potentially poisonous bugs involving dangerous fixes do I really want to weigh the risk of fixing it against the risk of not fixing it.
Establish Work Precedence
The best way to avoid confusion about what comes first is to ensure everyone in the organization takes their cues for work precedence from priority and nowhere else. Developers fix P1 defects first. Testers verify P1 fixes first. Technical writers document P1 issues first. Everyone works in priority order: the priority reflects importance to the business. Saying, “This bug is more severe than that one so I’ll work on it first” is as bad as saying, “I like this bug more, so I’ll work on it first.” The severity rating is technical information used by managers as a piece of the formula in determining the priority rating. The priority rating is the final word on the order in which the work is done by programmers, testers, and everyone else.
The ultimate lesson here, regardless of the terms or levels you use to categorize your bugs, is that any classification scheme will only be effective if everyone agrees on definitions. So perhaps that’s the very first question to ask when an argument is brewing about severity, priority, or risk: “Help me understand exactly what information you’re using from each defect record and how you’re using it?”
Originally published on Computerworld
“This new system is driving me crazy!” Janet, the hotel desk attendant, muttered as she punched at the keyboard buttons. She looked back at me, flashing her best customer smile. “Sorry, it will be a minute.”
She returned to scowling at the keyboard. Apparently the system finally accepted her input; she looked up at me with a satisfied expression. There was a pause as we waited for the system to respond. A long pause.
To fill the time, she asked, “So Ms. Hendrickson, what do you do?”
“I work with software development organizations to improve software quality.”
“OH!” she exclaimed. “I wish you were at corporate. I don’t know what they were thinking. This new software was supposed to be an improvement, but it’s much worse than the old system. It’s slow, and I can’t figure out what it wants from me half the time.”
I involuntarily began imagining the process at corporate.
A 15-person Steering Committee directed a five-person Requirements Task Force to analyze the business and user requirements. The Requirements Analysts sent out surveys, poured through help desk call records, and even interviewed a few users. They produced an 83-page tome that they handed off to the Designers.
A three-person Design team wrote a specification answering the requirements. The 96-page specification was nominally written in English, but because of the amount of jargon used it required a translation guide. The Design team sent it out for review with a deadline for comments. The specified date passed with no comments from the Steering Committee or the Requirements Analysts (who were off to new assignments so they couldn’t spare any more time for this project anyway). The specification went to the Programmers.
The Programmers implemented to the specification. There were a few things that were very difficult to do, so they compromised. It would be no big deal if users had to enter a few more keystrokes to access that information, right?
Then the Testers were given two weeks to test it. It took most of the first week to figure out how the new software worked. They found a few bugs, but no show-stoppers. The Programmers fixed a few things and the software was deployed to the field.
That’s where Janet comes in. Janet doesn’t know anything about the ins and outs of creating software. She probably doesn’t want to know. She just wants to serve her customers well. And this software is not helping.
Back at corporate, the Steering Committee, Requirements Analysts, Designers, Programmers and Testers are congratulating themselves on a solid release. What they don’t see is Janet’s pain.
All this flashed through my mind in an instant. I looked back at Janet. “Have you called corporate to tell them what you think?” I asked. “What good would that do?” Janet sneered. “I’ll wait on hold for 25 minutes before getting to someone at the help desk. And they’re never much help. No, I’ll deal with it. Maybe it will get easier. They’re sending me to training next week.”
So the feedback loop is broken. The team back at corporate has no mechanism to find out whether the software is any good. Oh, sure, they’ll detect catastrophic problems that cause servers to go down. But they won’t see the little things that cause long queues at the front desk of the hotel.
If we interviewed the team that created the system, they’d say: “This is our best release ever. We did all the right things. We analyzed requirements and wrote specifications before writing the software. We tested the software before we deployed it. How could the result be wrong?”
How indeed?
Perhaps important nuances were lost in the requirements and specifications verbiage. Perhaps the ship criteria, “no showstopper bugs,” could indicate either “solid code” or “not tested.” And perhaps the lack of a feedback loop from the field means they have no way of knowing how the users like the new system. “We’ve been deployed for a month and had only five calls!” the team crows. Like a broken pipe, they see only the trickle of complaints that make it through and miss the flood of complaints leaking away.
Of course, all this happened in my imagination. But I’ve seen it happen in reality. Ironically, organizations that control their software development process tightly don’t necessarily serve their users any better than organizations that cobble something together and throw it over the wall. It’s easy to become so tied up in process that we forget the reason for building the software in the first place. Unless we close the feedback loop, we don’t really know whether what we’ve produced is really any good.
Just ask Janet.
Originally published on stickyminds.com
As I reached for the coffeepot, I noticed a carefully printed sign taped to the cabinet. It read:
Important:
- Don’t press the Brew button more than once.
- Please make sure the urn is aligned under the brew basket.
- Please ensure the urn is empty before brewing.
The sign struck me because it was such an elegant description of the known risks involved in making coffee. Clearly this organization had experienced coffee disasters in the past. Someone had compiled a list of the most common causes of brew failures and created a checklist of tests. What a wonderful set of guidelines!
Part of the power of such a checklist is its brevity. The author of the coffee-making checklist could have also added items such as “make sure there’s coffee in the basket” and “remember to use a filter,” but then the most critical items would be lost in the verbosity. It’s obvious to even a coffee-making novice that coffee is an essential part of the process. So it seemed to me as I was reading this carefully crafted checklist that the point wasn’t to catch every possible error. Instead, I think the author probably thought carefully about:
- What seems to go wrong most often?
- What errors are difficult to see at first glance, and thus require concentration to prevent?
- What causes the most damage when it happens?
I also noticed that the items the author chose to address involved subtleties of the coffee maker interface and the interaction between the urn and the coffee maker. These weren’t obvious errors. They’re “gotchas”: things you learn about this particular coffee maker only after painful experience. They were eloquently worded. They also work. During my weeklong visit at this organization, I didn’t see a single coffee brew failure. This was an organization that knew how to learn from mistakes.
Of course, it’s easier to create such a checklist for a comparatively simple mechanical process, like making a pot of coffee, than it is for the complex process of building software. It takes much more care to construct similar guidelines for software. Yet it’s a worthwhile exercise. By identifying the top five or so things that go wrong in various activities, we might just prevent the most common errors and save everyone a lot of time.
A checklist for beginning a test might look like this:
Important! Before you begin your test:
- Make sure all delivered files have the correct version.
- Set up your test environment to emulate the real-world environment as closely as possible.
- Ensure the system is in a known state before you begin testing.
Now there’s plenty that you might add to this list. Adding a few things specific to your environment would be a good idea. But remember the power of brevity. The more items you add to a list, the more likely someone using the list will inadvertently skip over items, inviting disaster.When constructing a list like this, try filling in the blanks in these sentences:
- If anything is going to go wrong here, it’s most likely: ________________
- The top three most damaging kinds of failures would be: 1) ____ 2) ____ 3) ____
- The three most common causes of failures are: 1) ____ 2) _____ 3) ______
The answers to these questions form the basis for your checklist. Let’s see how this would apply to the process of releasing software after it has been tested.
- If anything is going to go wrong here, it’s most likely: the software that ships isn’t the software that was tested.
- The top three most damaging kinds of failures would be:
- Shipping a virus.
- Failing to install and load.
- Breaking other software (e.g., the operating system) on install or uninstall.
- The three most common causes of failures are:
- Not verifying that the final release matches the last build tested.
- Failing to virus-check the final release.
- Untested configurations.
In the end, my checklist for releasing software looks like this:Important! Before releasing software, please:
- Compare the last tested build against the release build to verify that they match exactly.
- Run at least one, preferably two, up-to-date virus checkers on the final release.
- Install, launch, and uninstall the final release on all supported operating systems.
Notice that all the items on this list are specific actions. It’s tempting to include general guidelines such as “ensure the software does not corrupt data” on the list. Yet doing so would weaken the list. This is a prerelease checklist, intended to tame last-minute release chaos. It’s not an all-encompassing list of things to test. Presumably someone tested the software before we decided to ship it.Finally, it’s important to note that different organizations will have different lists. Your checklists will reflect your requirements, your software’s quirks, and your organization’s historical weaknesses. Your context is unique; your checklists should be too.
Developing software is certainly more complex than brewing coffee, but both require attention to detail in order to avoid large messes. Just as the organization I visited learned from its coffee-brewing fiascos, all of us can use our past experience with failures to prevent trouble from brewing in the future.
Originally published on stickyminds.com
Over the last several days, I’ve encountered numerous instances of programmer bashing. I’ve heard nontechnical managers complaining about the “idiot programmers” in their shop, had test managers claim the programmers were “user hostile,” and most recently read a comment on a Web site calling for legal action against individual programmers who introduce security holes. In each case, it seemed that the commenter intended the comment to be humorous. At least for me, the humor is wearing thin. An attitude of blame lay just behind the humor: “The programmers make the bugs, blame them for poor quality!”
Every time I hear a comment painting large groups of programmers with the same brush, I think of programmers I’ve known who don’t deserve that treatment.
Mikey, the youngest programmer in an Internet tools company, prided himself on producing 100 percent reliable, robust code. He had a standing offer: If you found a bug he didn’t know about in his code, he’d buy you lunch. He didn’t have to buy many lunches.
Ward, an old-timer who’d learned to program on punch cards, had a patient and calm demeanor no matter how hairy the project. He took the time to do things right, even when that meant standing up to a manager who wanted Ward to cut corners. His approach paid off. By the time he delivered his code to test, there were few bugs. Those that did appear were minor oversights or unpredictable integration issues. Further, the time he spent drafting comprehensive specifications made it easy to design the tests and write the documentation.
Karen specialized in internal tools. If you could envision it, she could build it and build it quickly. She was so speedy, she often had it built within a few hours after you said, “wouldn’t it be cool if…” Generous with her knowledge, she was a patient if sometimes demanding mentor.
In fact, the more I think of all the programmers I’ve ever worked with, I can remember only a bare handful that were incompetent or even remotely deserving of ridicule. The vast majority of programmers are diligent, capable folk. They truly care about the quality of their work and want the software they produce to be useful. They are more interested in producing a good product than playing CYA games. They work hard to make sure they are implementing the right features and writing solid code.
I have, however, seen a few situations where programmers seemed to lose their focus on the customer. Betty, an otherwise quality-conscious programmer, surprised me by insisting we should ship software that was causing intermittent blue screens on Windows machines. She pointed out that the blue screens were rare, couldn’t be debugged because they couldn’t be reproduced, and couldn’t possibly be caused directly by her code. I was shocked. I insisted, in a nastier tone of voice than I intended, that we weren’t going to ship anything that caused blue screens. Betty and I ended up in a screaming fight—one of the few in my career—and it took a VP to force both of us to back down long enough to talk about the real problems.
If I’d stayed calm just a little longer, I would have realized how much pressure Betty was under. Her bonus, and quite possibly her career with the company, rested on her doing well with this project. It was a high profile, quick turnaround project and had been handed to her because of her previous track record for releasing well-received products within tight timeframes. Betty had a reputation to protect and difficult expectations to meet.
Unfortunately for Betty, she was building on a none-too-stable base. Her code worked on top of an existing system, a system rife with stability problems. Her code manifested the intermittent blue screen, but the underlying fault had been there all along. Sick of still having a six-week project on her plate after twelve weeks, she was anxious to declare victory. From her perspective, the project had been done some time ago. She figured it was someone else’s responsibility to fix the other code on which her product depended.
Of course from my perspective her attitude was indeed hostile to the end users. I’m sure I called her an “idiot programmer” and worse during that project. But she didn’t deserve it. She was simply reacting to a difficult situation. Ultimately, she agreed that we shouldn’t ship until we could resolve the blue screen issue. She realized that no matter who was at fault, she was responsible for fixing the intermittent crash.
The next time you’re tempted to think of your programmers as idiots, incompetents, or quality hostile, remember that no matter what else they may be, they’re people first. It is far more likely that they are having a very human reaction to a particularly bad situation than that they are incapable. Perhaps, like Betty, they’re feeling trapped in a no-win situation. Before you condemn them, ask what’s going on from their point of view.
Similarly, the next time you’re tempted to hang a programmer up by his toenails, remember the last time you made a mistake. I’ve made some real whopper mistakes in my time. We all have, whether or not we choose to admit them or even remember them. It may be that some programmers don’t care about users, but it’s more likely that bugs are honest mistakes made under difficult circumstances.
I don’t expect you to put a bumper sticker proclaiming “Have You Hugged a Programmer Today?” outside your work area. But the next time you’re tempted to vent your anger at a programmer, see if you can imagine the factors contributing to his or her behavior. After all, we’re all human, with all the brilliance and fallibility that implies.
Originally published on stickyminds.com
I have a deep appreciation for games. I believe that they allow us to explore the way the world works in a small, contained setting. Every game I play teaches me different lessons I can apply elsewhere in life. Games of chance teach me about risk. Games of skill teach me about strategies and tactics. Both kinds of lessons serve me well in software management.
My father taught me to play chess when I was a child. The key to playing chess well is thinking ahead—imagining how your opponent will react to your move, and then how you will react to your opponent’s possible moves. As a novice chess player, I had difficulty imagining my own moves much less figuring out how my father might respond. Similarly, as a novice manager, I had trouble seeing the effects my actions might have.
Thinking ahead—anticipating outcomes and analyzing implications—is a key to building good software. We need to think ahead when managing the project, changing requirements, making design choices, and designing tests.
Consider the client/server software where the client software assumed that everything the server sent would be in the proper format. Unfortunately, the server generated bad data on occasion. When the server generated bad data, the client tended to crash.
Or consider the company that changed its business strategy from selling software to hosting the software as a service. By design, the software could support only one customer per server. As a result, the company spent about $15,000 on hardware for each new customer, more than some customers paid for the service over the lifetime of their contract.
So what can we do to improve our ability to think ahead? How can we predict the consequences of our decisions on software projects?
Practice, Practice, Practice
It’s easy to practice chess. Find a partner or run a chess program and play the game. It’s harder to practice thinking ahead on software projects. But without practice, it’s difficult to become good at it.
It’s easier and safer to practice thinking ahead in business when you aren’t the person in charge. Imagine yourself in the leaders’ shoes. What would you expect to happen? How would you react? It may take a long time for the organization to feel the effects of decisions, so keep observing even after it looks like the events are water under the bridge. Keep track of your predictions and the actual results.
Study Past Games
Great chess players often study past games to learn gambits, combinations of moves that have been effective in the past. You can learn a great deal from past projects in your company. Review the notes, status reports, metrics, and any other artifacts you can get your hands on. Identify the pieces. Think about the shape of the board. See how the team made decisions and analyze how well those decisions worked.
Your historical knowledge helps you in three ways:
- It helps you improve your ability to predict the future, and thus think ahead.
- It teaches you which strategies and tactics have worked in your environment in the past and which have failed.
- It gives you leverage in influencing your teammates. It’s powerful to be able to say, “Well, you know, we tried reviews back on the 2.5 release and they worked well. We stopped doing them because of schedule pressure, but maybe we should start again.”
Ask “What if?”
“What if?” is the most powerful question a game player or software professional can ask. The company that changed its business strategy was too busy trying to attract customers to ask, “What if we’re enormously successful and sign up hundreds of customers?” The answer would have been that they would need hundreds of servers, a place to put those servers, and administrators to run those servers—problems that all came to fruition.The company where the engineers decided that the client didn’t need any error handling to handle bad data didn’t ask the very simple question, “What if the server isn’t infallible?”
Jerry Weinberg says, “If you haven’t thought of three possibilities, you haven’t thought enough.” For each decision, consider at least three possible outcomes. For each outcome, consider at least three possible causes. Cause and effect in a game is straightforward. Cause and effect in real life is far less tractable. Use the rule of three to expand your “What if?” thinking.
Think
Finally, the ultimate key to thinking ahead is to think. Think actively. Think continuously. If you find yourself on autopilot, going through the motions, it’s time to pause long enough to engage your brain. Use your greatest tool—the one between your ears—to your best advantage.
Thinking ahead is a learned skill, both in games and in real life. I struggled for years before finally beating my father at chess and I still can’t beat him regularly. Just when I think I have him backed into a corner, he makes a surprising move and crows, “Checkmate!” If thinking ahead in a game with a finite number of tactics and strategies is so difficult, it’s no wonder we often fail to think ahead in the real world where possibilities are endless. But as long as you’re thinking, you have a much better chance of thinking ahead.
Originally published on stickyminds.com
In 1628, the grand warship Vasa launched for her maiden voyage. What started as a ceremonial trot around the harbor ended in disaster. Ten minutes out, the Vasa sank, taking many of those aboard with her.
You might be thinking, “Thanks for the history lesson, but what does this have to do with software?”
I know something about the sinking of the Vasa because I had the opportunity to visit the Vasa in her home, the Vasamuseet, last year. While there, I spent hours reading the plaques and playing with the computer simulation of her capsizing. That’s when I realized that the Vasa story is being relived today in organizations throughout the software industry.
The Vasa’s is a story of a project gone awry, taking the project team down with it. Some of the contributing factors that led to the Vasa sinking centuries ago will seem terribly familiar to software folks today.
It was an ambitious project. With 64 guns on two gundecks, the Vasa was to be the mightiest warship built to date. Thus it was especially inconvenient that…
The leadership changed mid-project. The original architect, Henrik Hybertsson, died before the project could be completed. His assistant, Hein Jacobsson, took over after his death. Not having the original ship builder see the project through to completion was an even bigger burden than you might imagine because…
There were no detailed plans. At the time the Vasa was built, experienced ship builders used their past experience along with key measurements to guide the ship building process. And then…
Upper management dictated the ship date (literally). King Gustavus Adolphus decreed that the ship must be finished by 21 July 1628 or the shipbuilders would face “His Royal Majesty’s disfavor.” Displeasing the king was then, as it is today, a career-limiting move. The king also saw to it that…
There were late-breaking changes in the design. The hull of the mighty war ship was created from four segments of wood. Typical designs at the time involved only three segments of wood, so some archaeologists are guessing that the size of the ship was expanded during construction. Further, the king decreed that there would be two closed gundecks rather than the traditional one, thus allowing more, bigger guns on board. The added weight above the waterline resulted in an instability that was detected when…
The ship failed its acceptance test. One of the final tests that the team undertook was a stability test known as the heeling test. In this test, thirty sailors ran back and forth along the deck to make the ship rock. The test was halted after just three runs because the ship was rocking so badly. The ship builders and the king were not present for the test. Admiral Fleming, the admiral of the Vasa, was present, but seemed unconcerned by the test results. He approved sailing the ship despite her apparent instability. How could someone ignore such dramatic test results? It’s understandable if you consider that…
The cost of failure was too high. So much money had been poured into the Vasa project that failure was inconceivable. By the time the team ran the heeling test, there was little that could be done to change the ship. Having worked on a few software projects with the same characteristics as the Vasa, I can guess that it was easier for the project team to ignore the bad test results than to consider scrapping the entire project.
Applying Lessons from the Vasa to Software
So what can we learn from this disaster that we can apply to our work in software?
Lesson #1: Break ambitious projects into smaller deliverables. Like many software projects today, the Vasa was a tremendously ambitious project. Although we can’t stop doing ambitious software projects, we can break them up into a series of less ambitious projects. That’s an advantage that software development has over ship building: you can build parts of the system that work independently, then bring them together into a cohesive whole.
Lesson #2: Share knowledge. Henrik Hybertsson was the visionary behind the Vasa. When he died, he left behind a team that was ill equipped to deal without him. We can’t prevent key project people from leaving, but we can mitigate the effects of their leaving by documenting plans and cross-training personnel.
Lesson #3: Manage upward. King Gustavus was accustomed to people doing what he told them to do. While we can’t prevent kings or executives from demanding more features and earlier “ship” dates, we have a responsibility to analyze the implications of their demands and educate them about risks.
Lesson #4: Publish test results, even the bad ones. Those present at the Vasa’s heeling test did not speak of it again until the inquisition following her capsizing. Even then, only the outspoken Captain Hansson had the temerity to bring up the test. No one on the Vasa project team informed King Gustavus of the results of the heeling test before the ship sailed. I wonder if King Gustavus would have allowed the ship to sail if he’d known how unstable she was?
Lesson #5: We can’t stop failure by ignoring risk. As I read the story of the Vasa, it seemed to me that the people on the project team could not admit to themselves that the ship might not be safe. Yet that unwillingness to admit the risks caused even greater loss—loss of life. Ships sink. Software fails. We can’t stop failure through sheer force of will, much as we might like to.
Building the Vasa was a large and complex undertaking, full of risk and challenges. Each decision that contributed to the final disaster no doubt made perfect sense at the time in light of the king’s demands and the political climate.
Ultimately, the story of the Vasa is a tale of human fallibility. The struggles we have with large software projects aren’t new—they’re extensions of the struggles people have had with complex, difficult projects involving new technologies through the centuries. It just happens that software is a ubiquitous new technology, touching every aspect of our lives.
If you would like to learn more about the Vasa, visit the Vasamuseet home page,
or read
http://dossantos.cbpa.louisville.edu/courses/cis675/vasa/index.htm
a case study highlighting some of the communication and management problems.
Footnote: only after this piece was published did I learn that the story of the Vasa has been told in a software context before in Tom Love’s book Object Lessons.
Originally published on stickyminds.com
I was very pleased with myself. I’d just found a bug that, under certain circumstances, could result in data in a stored file becoming corrupt. I tried not to gloat as I explained the bug to the project manager. His response floored me. “Oh, that. Yeah, we know. No time to fix it. How did the upgrade tests go?”
“You KNEW? And you won’t be fixing it?!? Data gets corrupted!” Self-righteous anger bubbled up, blurring my vision.
“Whoa. Calm down. Yes, we got a report about that bug from one of the field engineers last week. We gave it a lower priority because it was easy to tell the file was messed up and there is an easy workaround. The bug has been there since 1.0 and fixing it now will require major changes. We’re just about to release and can’t delay the schedule to fix an old bug. So now tell me about the upgrade tests.”
I fumed in silence, then turned to leave. The project manager stopped me. “What about the upgrade tests?”
I shot back over my shoulder: “I was so busy isolating this bug that I didn’t finish them. I’ll have the results tomorrow.”
The project manager frowned. “I really need the results today. Last week you told me you’d have no problem getting them done. I don’t think you understand how important these results are.”
“I’ll get them done before I leave today,” I mumbled. As I left his office, I wondered, What went wrong? Why didn’t he care about my news?
I realized that I hadn’t clarified up front what kind of information was most important to the project manager. If I had, I would have understood the importance of those upgrade tests before spending half a day chasing down the file corruptor bug.
It all comes down to requirements. Tests have requirements. That incident with the project manager was a wake-up call for me. It was the first time I realized that my audience (managers) needs particular kinds of information. Like me, you could argue that the file corruptor bug was important. However, whether or not to fix a bug is a business decision. The project manager had the perspective and authority to make that kind of decision; I did not. At the same time, the project manager was relying on me to give him accurate information to support his business decisions.
So tests have requirements, but I wasn’t sure how to discover those requirements for my tests. The laundry lists of features—often labeled “Requirements”—didn’t help. I started by asking, “Who uses the information I produce and for what purpose?”
In this case, the project manager wanted to know if the upgrade process worked as designed so he could make a release decision. He didn’t want more bug reports unless the bugs were new to this release or interfered with the core functionality of the software. If I happened to encounter bugs, he expected me to file them. He just didn’t want me to spend all my time digging for bugs at the expense of running the upgrade tests.
Different projects have different test requirements. Further, the nature of the test requirements may evolve as the project progresses.
For example, on one project, we needed to find as many bugs as possible early in the project. Later in the project, we needed to understand the end user’s experience during typical use. Our test requirements changed mid-way through the project. As a result, we changed our testing strategy. We did a lot of bug hunting early and spent a great deal of time characterizing the performance from the end user’s point of view when the system was closer to release.
On another project, the decision-makers needed to know, “What’s the worst thing that can possibly happen if we release this to the field?” In other words, management wanted the testers to find nightmare bugs: the biggest, nastiest, slimiest bugs possible. If the bugs were bad enough, they would hold the release. If the worst bugs we could find were largely cosmetic, we’d ship. Our goal wasn’t to find a lot of bugs but to find significant bugs.
In each case, management—whether a single project manager or a committee of stakeholders—needed the testers to learn about particular characteristics of the software and report back what they’d found. The more we focused on gathering the information that management needed, the more effective we were. The more we focused on gathering information we happened to find interesting, the less effective we were. When managing a test effort, I need to know what questions other managers expect us to answer. Do they want to know:
- What is the user experience for typical usage scenarios?
- How well does the software implement the design?
- How well does the software meet requirements?
- What kinds of bugs crop up under less-than-ideal conditions?
- How reliable or accurate is a particular feature?
- How stable is the software under normal use?
- How reliable/stable is the software under load?
When I’m testing, I find that I am most effective when I focus on answering one or two of these kinds of questions at a time. When I try to gather too many kinds of information at once, I get sidetracked—as I did with isolating the file corruption bug. When I’m not sure what questions I’m supposed to be answering, I ask.These insights led me to another realization: when management seems to be undervaluing testers, it may be because the testers aren’t getting the information management really needs. Perhaps the most powerful question a tester can ask managers is, “If you could know any one thing about this software, what would you want to know?”
The answer may surprise you.
Originally published on stickyminds.com
Groove. Groovy. In the groove. The word “groove” has positive connotations. We’re in the groove when we’re focused and productive. When I’m finding lots of bugs in new software and simultaneously learning a lot about the system, I’m in a groove.
However, being “in a rut” has negative connotations. When I’m doing the same thing over and over again, I’m no longer in the groove, I’m in a rut.
So how do you know when a groove becomes a rut, and what can you do about it? It’s a subtle change that slowly sneaks up on you. Comparing activities day to day reveals little difference. Yet the more we wear the groove, the deeper it gets.
Imagine you’re a tester assigned to find bugs in a project. With the first build of the system into test, you’re just getting started. You’re not yet in the groove. By the next build into test, you’ve got a handle on things. You’re beginning to get an idea of where the weak spots are and you’re having fun exploiting them to find serious issues. A few more days into the project, somewhere around build 3, you become a bug-finding machine.
But by build 10 or 11, you’re beginning to dread new builds coming into test because it means rerunning tests to ensure nothing else broke as a result of recent changes. Your energy flags. You begin to wonder if the developers put the bugs there on purpose just to test your bug-finding skills. Within a few more builds you’re spending most of your time running tests you’ve already run. You’re using the same test data as before because it would take too much time and energy to come up with new test data.
Now you’re in a rut. The longer the project goes on, the more deeply entrenched you become. If subsequent releases of the software have only minor changes, you may even extend your rut from one project to the next. I know because I’ve done just that, only to wake up one day and realize that I haven’t done anything new for months.
Being in a rut is a dangerous state for a tester. Given that there are an infinite number of tests we could conceivably run on any given piece of software, we need as much creativity, originality, and energy as we can muster.
Certainly, there are times when we need to rerun tests. However, even when rechecking areas of the software we’ve already tested, we can vary the tests. Varying tests and test data ensures that we’re looking at the same functionality, just in a different way.
Taking the “Groove Test”
When working on a long project, I monitor the depth of my groove to ensure it hasn’t become a rut. Whenever I’ve been working on a project long enough to get in the groove, I begin asking myself these questions at the end of every day:
1. Do I still feel like I’m in a groove?
2. Have I done anything innovative today?
3. How could I update my test data?
4. Can I think of at least three different ways to run each of the tests I just ran?
When I notice internal resistance to change, I can tell that my nice little groove is about to turn into a perilous rut. That’s when I know I need to make changes. Maybe I can get a snapshot of some different production data to test against. Maybe I can add some complexity to my tests. (After all, by now the software should be mature enough to handle harsher tests.) Perhaps it’s time to look at the system from another perspective. Or maybe I can work with another tester to think up some truly devious ways of stressing the system. Now we’re talking!
I have a fifth question I also ask myself:
5. Am I still having fun?
When it stops being fun, I know I’m no longer being as observant as I need to be. That’s when the nasty ones sneak by me. I fail to notice signs that all is not well. I declare a set of features to have “Passed” when I didn’t set any real checkpoints at which the bugs might have been caught.
Failing to spot bugs because we’re too deep in a rut can happen to any of us, whether we pre-plan tests or use an exploratory approach. That’s why two different testers can take the same set of tests with the same version of the software and find vastly different bugs. The difference in the quality of their efforts is in their level of observation. One of my early managers declared that our test group achieved “Quality through Vigilance.” It’s hard to be vigilant if you’re asleep at the keyboard.
So what do you do if you’ve just discovered you’re in a rut?
- Throw away the test documentation for a day, pretending you’re new to the project and the software.
- Analyze the software under test in a different way. If you’ve already done boundary analysis and flowcharting, try looking at the architecture of the system to see the big picture. If you’ve looked at the relationship between the pieces and parts, try designing real-world use cases.
- Change your data. If you’re using test data, get real-world data. If you’re already using a snapshot of production data, get production data from a different system. If you’ve exhausted all your sources for production data, combine the data you have from varying sources into a gigantic set of diverse test data.
- Change your timing. Time is an oft-forgotten dimension of testing. Try doing the same thing but at a different pace or with a different rhythm.
So take the five-question “Groove Test” to gauge your ability to see new possibilities and act on them. Whenever you start resisting the idea that your tests might need to change, it’s time to change them!
Originally published on stickyminds.com
Some time ago, I was acting as a QA manager in a troubled organization. We were experiencing quality problems: the software was shipping with more defects than we wanted and was becoming less rather than more stable with time. I felt quite certain I knew what the problem was. “We don’t have any coding standards,” I fumed. “We don’t have any guidelines to help developers make good decisions about the tradeoff between improving performance and bullet-proofing. We don’t have any documentation on how to use our own APIs” (Application Programming Interface).
I took the opportunity to bend the ear of a friend in the industry, Brian Lawrence, over a friendly lunch one day. He cocked his eyebrows at me quizzically (a Brian Lawrence trademark) and asked a simple question: “Are you sure that’s the problem?”
At first, I was defensive. “Of course that’s the problem!” I grumbled. But Brian was right. The lack of coding standards was a problem, but it wasn’t The Problem. There were a number of reasons for our quality problems. The most important of those problems was a lack of requirements and design analysis, rather than a lack of coding standards.
It’s Not the Problem?
I often fall into the trap of thinking that the first problem I see must be The Problem that needs to be solved. Perhaps the problem I spotted is indeed worth correcting, but I almost never manage to spot the true critical issue at first glance.
At one company, I thought that The Problem was a lack of unit tests. I started beating the drum for better unit tests. Some of the developers agreed with me and encouraged me. Others politely ignored me until I went away. Still others rolled their eyes whenever I started my “Unit Testing Is Your Friend” stump speech.
Then I attended a meeting in which we were discussing bugs in a particular area of the code. The developer responsible commented, “Oh, good. I was wondering when you guys were going to start finding those bugs.” I was shocked. The developer already knew about the bugs. He didn’t need better unit tests; he needed time to fix the bugs. He knew he wouldn’t get that time until the test group found the bugs themselves.
I had found a problem: in some cases the unit tests were insufficient. However, that wasn’t The Problem. In this case, the deeper problem was that developers didn’t get time to fix the bugs they found. As far as management was concerned, until a tester found it, it didn’t exist.
No, It’s a Symptom
In a meeting toward the end of another project, a developer began talking about some changes he’d made to the way a particular feature worked. “Wait,” I interrupted. “I thought we were in code lockdown phase. Now you’re telling me that you’ve added functionality. What’s going on here?” Three weeks to ship and we were still changing the way the software worked. “Poor change control will sink us!” I thought.
However, it turns out that the changes the developer was making were part of the original plan. The problem wasn’t feature creep or poor change control, it was that the schedule said we were supposed to be done with the feature, so development claimed to be “done” while still implementing functionality outside the process. What we really needed was a more honest schedule, not a better change control process.
My Problem, Your Problem, The Problem
On yet another project, we were in code lockdown phase. Only Sharon, the development manager, was supposed to have access to check in code, thus forcing all changes to go through her. Because we were very late in the process, Sharon was to review all changes before checking them in—enforced code inspection.
I heard a rumor that one of the developers was checking in code on his own, without going through Sharon. That developer’s area was also currently the most bug-ridden. “Aha!” I thought. “Here’s the problem!” “Sharon, I heard that Bob is checking in changes without going through you. Is that true?” I asked in a meeting. Sharon looked at me, then looked away guiltily. “Well, I’m pretty overloaded and I don’t understand what that code does anyway. So I gave Bob permission to check in his stuff without going through me.”
I’d found a problem all right. Bob’s code was so complex that the manager of the department, a fine developer in her own right, didn’t understand it. The problem wasn’t unmonitored changes; it was the complexity of the design.
Solve Symptoms; Keep Looking for Problems
In each of these stories, I really had found a problem. However, I hadn’t found the real problem. I’d found symptoms. So if problems hide behind problems, how can you spot The Problem?
- Open up your mind to other possibilities. Ask yourself, “What if the problem I see didn’t exist? What other problems might be at work here?”
- Talk with others about what you’re seeing and hearing that’s telling you there’s a problem. Listen carefully to their responses. Often people who are not part of the problem have the greatest insight into what might be going on.
- If the problem you noticed is simple enough to address, fix it and see what else crops up.
You might even find out that The Problem is a lack of definition of the problem(s). For more information on defining the problem, see Are Your Lights On? by Gerald M. Weinberg and Donald C. Gause.
