Release Triage – Managing your IT emergency room with a 4-step process
Posted by Kevin Miller
This article was published via Pluralsight.
In the medical field, the word triage is used to define the sorting of patients according to a system of priorities, designed to maximize the success of treatment. A triage is set up in an emergency room to help prioritize cases when a large number of patients arrive. First come, first served is a poor method for treatment when a critical patient needs help and has to get in line.
IT systems are the same way. In fact, an alternate definition for the word triage is: The assigning of priority order to projects on the basis of where funds and other resources can best be used, are most needed, or are most likely to achieve success.
When it comes to IT releases, there’s no such thing as a sure thing. Only an inexperienced project manager would go live without a plan for how to handle immediate issues. You can test every last feature multiple times, but it’s likely you’ll encounter something unexpected shortly after a major release. Having an effective release triage plan will raise the confidence of your team and should be activated immediately following your release.
Before you start: Communication and buy-in
First and foremost, your project sponsors and major stakeholders should be made aware of your triage plan. And they also need to know that for the first few days following a major release, you and your team will be working in a triage state.
This means your main project team shouldn’t roll straight on to the next project. Nor should they schedule any vacation days, or attend any long meetings; it’s important that the team be ready to address issues as they arise. Without this commitment from IT, your end users will feel abandoned and lose confidence in the new system. It’s also important that key business contacts remain available as well, as their input during the triage phase is crucial.
After buy-in, follow these four steps to execute a successful release triage process.
Step 1: Record the issue
Every release triage plan needs a way for end users to report issues. For large organizations, this is usually a technical support phone number, email address or online form. For smaller organizations, it could be as simple as a shared spreadsheet. Whatever process you choose acts as the check-in desk for your emergency room.
The process of adding a new issue to the triage list needs to result in basic information: What is the issue? What steps did a user take to encounter it? Can it reproduce be reproduced? What should have been the result of the steps taken? The more information obtained up front, the faster a resolution will materialize.
Once the basic information is gathered, the issue is now sitting in your triage waiting room. You can use a number of tools to track the triage list. My favorite is a large white board; it can quickly and easily be changed and it keeps everyone on the same page. Even if your organization uses a help desk solution, a white board is a great supplemental tool.
Step 2: Gather details
Select an issue on the triage list for which to gather details if the information obtained from step 1 is not enough. The best and fastest way to get detailed information is for your team member to go straight to the source. When the person responsible for fixing the issue communicates with the person who reported it, and is walked through the problem step-by-step, a lot of the details reveal themselves.
Occasionally you’ll find there was a user error and there is nothing actually wrong, other than the user possibly needing some additional training. This kind of one-on-one interaction leaves little room for assumptions and misinterpretations.
Observing directly also allows for your team to learn typical end user behaviors, which may not have been considered during the planning, development or testing phases of the software development lifecycle. This information can be useful for the problem at hand, as well as in future development efforts.
Step 3A: Move non-critical issues to the work backlog
The developer has now seen first-hand the steps the end user performed, the incorrect result and what the result should have been. If the issue isn’t critical or a new request, it should be documented and moved out of the triage list and into the work backlog. Once an issue is moved into the work backlog, it’s no longer an emergency and your team returns to step 2 for a different issue.
Step 3B: Fix critical issues
While it would be great to investigate and prioritize all issues first, the reality is issues often keep coming in. An emergency room doesn’t wait for all the patients in a day to arrive before starting treatment to the one most in need.
Of course there are exceptions to this rule, such as when you suspect multiple issues are related and you think you can solve two or more at once. Or when an even more critical issue arises than the one you are currently working on. For example, the current critical problem is preventing people from working thereby costing the company a few hundred dollars per hour. However, the new critical problem is causing a loss of customers, which results in a loss of tens of thousands of dollars.
Step 4: Deploy
Test the fix and release it. If your organization follows a structured change management process, don’t forget to request an emergency change. You can now safely work on the next issue in the triage list.
The entire process is straightforward when laid out end-to-end, but that doesn’t mean your triage period won’t be chaotic. Most emergency rooms run like well-oiled machines until multiple patients move to the front of the line. Commotion doesn’t mean the process is broken; it simply means there are too many issues over a short period of time, and not enough staff to address issues at once.
As mentioned above, communication is key when chaos ensues. Not many patients with a small cut complain about their place in line when they know a critical patient must receive treatment first. That said, your triage process will have zero chance for success without the following essential step…
Step 0: Planning
As an IT leader, you need answers to the following questions before going live.
- What tool will your team use to triage issues?
- How is the word critical defined?
- What are all of the roles in the process?
- Who will perform the various roles?
- How will status updates be communicated?
- How frequent should communication occur?
- Who should be included on the communication?
- What are the expectations for overtime during the triage phase following a release?
Conclusion
Once you are out of the triage phase, don’t forget to look back on your process to see if any improvements can be made. There are almost always changes that should be enacted to make the entire process better for everyone involved, including the help desk, developers, IT manager and end users.
To review, no matter how much testing you performed, going live almost always results in unintended and unexpected issues. A release triage plan will provide confidence to your project sponsors and end users, and help your project team avoid a sense of dread or feeling overwhelmed.