Remember, we structured the academy on debugging into the triad: Technology, Processes, and People. We talked about lots of technical tools in the previous section. Now it is time to talk about Processes.
2.2.1 Debugging Process
Let us start with a closer look into the Debugging Process - literally following the life of a bug from us getting aware of it till closing it down. Below, you can see the process laid out:
Note: Below we list a typical bug process. There might be a need for an expedited bug process to address immediate dangers or security issues. The expedited process should exist in parallel to the normal process and the requirements for an expedite treatment of a bug should be clear. Otherwise, we end up in constant expedition.
There are some nice resources, we want to share, too:
Cornell University Debugging Strategies
SimpleProgrammer Effective Debugging
126.96.36.199 Awareness stage
The very first step in the process of handling a bug is to become aware of it. Awareness means a stakeholder gets aware - either passively by suffering from it or actively by finding it using static or dynamic testing.
The point here is: Awareness itself is important but won’t lead to anything if we do not push on in the process. We see too often that users are aware of issues and discuss them somewhere (always a nice joke on the watercooler). Or other stakeholders like the operational folks.
That is the reason why Identification needs to be automatic or super-easy otherwise users collect bad experiences.
188.8.131.52 Identification of a bug
In this step, the bug is reported in an issue tracking system (see 2.1.1 Issue Tracking ).
Some aspects to think about:
- Data quality is the most important aspect here. Make sure to check out how good issue reports look - like described in Section 2.1.1. Also, favor automation over humans filling in surveys.
- Based on this information collected here, using a heuristic, a severity is assigned that informs the next step.
- Recording the context is king to enable the reproduction of the bug. But be aware of privacy aspects: A memory dump may contain sensible or PII data.
The most important outcome of this step is to enable a triage on the bug. Typically, we end up with missing information which is then collected by the senior person, delaying the triage, and simply exhausting everyone. We have a very radical suggestion to make for this, as you will see in the next chapter. Our advice: Keep track of how often the push-back was pulled and try to minimize this metric as hard as possible. It will never be zero but it is the noise factor resulting from the first two steps. Keep the noise low.
184.108.40.206 Bug triage
Sorted by the severity and urgency (again think about heuristics here), the bugs are worked on. In the triage, normally the following flags are assigned to the bug report:
- Accepted/ToDo - This one moves forward into the backlog of the team
- Push back/Delay - We might do this later but not now. Keep it on the backlog but assign a flag. Also, make sure you review these from time to time, otherwise, you build a stack of overaged reports. As a rule of thumb: Never push back/delay more than 10% of the current bugs to be triaged. It might sound impossible but otherwise, you actually made a decision (aka won’t do) but you are not brave enough to state it.
- Won’t do / Not a bug - Sometimes the bug process is misused to request for features - then defer to product management and let them decide. Sometimes what seems like a bug is actually a feature (when a security feature blocks a user from accessing information, it might be misinterpreted) - again make sure product management knows about it.
- Won’t do / Missing Information - This flags a bad bug report and instead of using development time to play Sherlock Holmes, we push back. Make sure the reporting stakeholder learns about this (in a nice and polite way with a message that explains what was wrong and how to make it better next time. This is the radical push back we talked about earlier but practice shows, you need to have this to keep the noise out of the pipeline. To keep this option from becoming an easy bug drain of work the dev team does not like (or to keep the rest of the company from accusing the dev team of doing so), let these tickets be checked by a senior guy who is responsible for the overall bug process. and see what can be learned for the first two steps in the process here.
- Won’t do - Yes, it is a bug but we won’t do it. Maybe because the feature of the product will be deactivated soon or impact versus costs makes no sense. Again, let these tickets be checked by a senior guy who is responsible for the overall bug process. and see what can be learned for the first two steps in the process here.
The triage should be done by a senior person who can estimate costs and technical complexity as well as thinks in business terms.
A bug that is accepted is fed into the team planning process and picked up by a developer. During the planning process, you should reserve sufficient time for bugs. That is easier said than done, we know. It always seems to be squished between immediate business needs and technical debt, or the one expert needed is overloaded. All true - but it won’t help. Somehow, bugs need to be addressed. Our trick: Review the outcome from time to time and take the satellite perspective. Do we need to train and enable more people on certain aspects of the solution? Is our planning realistic? If not, what needs to be done to make it realistic?
220.127.116.11 Solution Strategy
After the triage and the assignment, you as a developer have to research the issue at hand and build a solution strategy. Sometimes this is a matter of minutes, sometimes it needs to replay the error scenario several times, wade through logs, work through test input data, research libraries, etc. etc. This is a very demanding task and sometimes it needs a good amount of tenacity. Solution strategy best practices:
- If in doubt, ask. Ask the reporter, ask architects, ask the Product Manager. In reality, this is a sign that the quality of the report is not good enough. So, make sure to think if there is a need to adjust the reporting or the process overall.
- Sleeping over issues helps to have a fresh mind. Have a little notebook to write down possible ideas because sometimes the strike hits on the craziest places and you don’t want to forget it again. (Ever had this “I know I had another idea but what was it?”… see).
- Take the opposite approach: If you were asked to trigger such behavior, how would you do it? This leads to insights into where to look for the issue.
- Talk to other developers to understand things better. Obviously, the original author is a good go-to person because there might have been reasons for doing things a certain way.
- Try to encircle the bug by reducing the degree of freedom. Short-circuiting branches to force the app taking a specific path or set variables to constant values.
- Make the bug stand out by exaggerating triggers. Candidates are overuse of resources like CPU or memory, rounding errors, strings with “exotic” characters
- Use prototypes. Build a prototype solution and run the tests against it. A well-used strategy when you have super-rich libraries and frameworks (think Windows SDK) where sometimes you can achieve the same thing using various different approaches.
- Discuss your strategy with architects or other teams touched by your solution. Again, sometimes there are reasons why things are the way they are or you have a downstream system that actually uses and relies on the behavior you are about to fix.
18.104.22.168 Solving a bug
Now that we have a good understanding of where the bug comes from and how we gonna solve it, it is time to check out the code and do your changes. We separated this from the solution strategy step, as we would strongly advise not simply use a prototype solution you came up with during the Solution Strategy step. Rather, do it afresh and make sure to follow your coding guidelines. Now that you know what you want to do, do it in your best craftmanship way :-)
- Obviously, extend the unit tests because you can use TDD if you separate the two steps.
- Think about refactoring code to prevent
if-else slipways (you “heal” your bug by adding an
if that checks for the argument being in a certain range. If everyone does this, you get pages of
if in functions to handle cases).
- Did we mention not to simply copy prototypes?
- Obviously, bug correction needs to go through peer-review and your CICD pipeline.
22.214.171.124 Test Solution
Good that you have your solution. Now we need to test it as thoroughly as everything else. Obviously, test if it really solves the bug you wanted to solve. Next, do integration tests with the elements of the application you touched. As a next step, the code is handed down the typical CICD pipeline that you use. Code reviews included. While the patch is getting released, the bug process is not done yet.
126.96.36.199 Search similar
Now that you have all of this freshly in your mind, can you find similar occasions? Maybe fix those two as you are on it?
- Balance the time you invest and the change you force on the system with the outcome. Sometimes in legacy-code-land, leaving things as they are is preferable.
- Use tools like
grep or your IDE to search for possible other occasions
- Restart the process from the Solving step for each of those.
188.8.131.52 Learn / Document / Close
Maybe the most overlooked aspect is to learn and document things.
- Update the documentation in your wiki.
- If you learned something, share it with your team. Sometimes a simple shout out via Slack will do, sometimes it is a good brownbag session (a short presentation during lunch time), sometimes a tech talk (evening presentation with drinks and popcorn).
- Make sure to close the bug and do the paperwork. While it is not obvious in the process, it actually adds to data points you can use to estimate team performance or identify gaps in the process.
Key Take Aways
- Be aware of the underlying process and don’t jump into solving and skipping the cleanup.
- Every bug is a learning opportunity - make the most out of it.