Nobody likes finding problems with their IT systems. They can be frustrating, time consuming, and costly. While we can probably never stop problems arising completely (although that's the dream!) we can make sure we have the best problem management process in place for when they do occur and set ourselves up for proactive problem searching to reduce nasty surprises in future.
The more prepared we are, the easier our lives will be. In this blog we'll take a look at why problem management is so important, the challenges, and then ways that we can improve our current process (or build a new one) with Jira and Insight to make problem solving quicker and make proactive problem solving more automated.
Why should I care about problem management?
Problem management is the process around identifying and fixing the root causes of the incidents that occur in your systems. This could be part of an incident request (reactive) or as part of your continual service improvement strategy (proactive) to fix problems that create regular minor incidents or might cause major issues in future.
It is very interlinked with change and incident management. Problems tend to cause incidents and problems often require your change management process to implement. All three can benefit greatly from understanding the interdependencies of IT assets through a CMDB.
Thorough problem management can be time consuming. Also, as with any proactive activity, it's hard to find the time to focus on them when you're constantly faced with many reactive tasks. But there are huge benefits to problem management which make the focus and time well worth the effort. These include:
Stop solving the same incidents over and over
Instead, free up time within the team to focus on more complicated issues or proactive activities such as implementing new services or carrying out root cause analysis. This will also improve customer satisfaction as having to ask again and again for the same fix isn't a great experience.
If you can reduce downtime by catching problems before they become incidents then you're removing a source of costly downtime.
Lower time to resolution
Proactive problem management is key to solving incidents. The better prepared and trained staff are at solving problems, the faster they can solve an issue when they do occur.
Create a culture of continuous improvement
Looking for potential problems and finding ways to make things better is an excellent attitude for any team to have. Problems should be seen as learning opportunities (rather than something annoying) which help avoid a demotivating culture of blaming others.
Challenges with problem management
There are a number of challenges associated with problem management, many of which are on the 'softer' side of things. These include insufficient staff training, widespread blame culture rather than teams working together for a solution, and people having different priorities e.g. an incident manager will want to get services up and running asap but a problem manager will want to dig deeper.
We also mentioned earlier the challenge of finding time for proactive problem solving not only because there's urgent tickets in your backlog but also because it can be hard to know where to start when you do find time.
There are technical challenges too, which mainly center on the lack of information. Incident reports, asset details, and interdependencies between assets are all useful when problem solving. As is getting a high-level overview and history of all of these so you can identify repeat issues, troublesome hardware or potential knock-on effects due to dependencies.
We will take a look here at how to improve information and documentation within Jira using Insight, a. Jira app for asset management. We'll also discuss what you can do with both tools to make proactive problem finding significantly simpler.
As discussed having good visibility of your assets and then associating them to general requests, incidents, changes etc is key. Ordinarily in Jira, the asset would be typed manually into a field. This is usually fine in the context of one issue but it makes long term reporting difficult. If people write the same server in two slightly different ways you will lose one piece of information while searching.
You can make drop-down selector lists in Jira but then the challenge is keeping the information accurate as each time an IT asset changes you need to edit that list. It also gives you very little information about the asset itself other than the name.
Instead, a tool that can integrate a CMDB into Jira custom fields can be used. CMDBs store your important assets (the technical term is configuration item, but asset is often used) with various relevant attributes. Not only that, they also include dependencies between the assets, so you could see for example, see what services a server is running and what operating system it runs on, all useful information when digging into a root cause problem.
This is where the Insight app for Jira comes into play. A core element of Insight is the CMDB. In past blogs we've discussed how the Insight CMDB can be used to help with incident management and change management, so now let's look at it for problem management.
You can build a CMDB of whatever assets you need. This includes IT assets but could also include employees, facilities and even customer data. Whatever item you want to know information about and want to understand the dependencies of. For the purposes of this blog we will focus more on IT assets.
Once you have created your CMDB in Insight you can link it to Jira custom fields. This gives your customer drop down menus to select from to pick out the asset they are making a ticket for. These lists can be dynamic so they depend on what someone selected further up the Jira ticket. Also, now you only need to keep your CMDB accurate and there's importers and network scanning tools like Insight Discovery to help you do this in an automated way.
Once created, a link is made between the ticket and asset (known as an object in Insight) in the CMDB. From the Jira ticket, the IT team is able to see the object(s) and various attributes from the ticket itself.
If you implement this way of working for all your Jira tickets you're going to very quickly document a history of what is happening on your IT systems making both reactive and proactive problem solving significantly easier.
For someone investigating the root cause of a particular incident they can see what equipment or software was involved, understand more details, and see the history of that asset to look for any underlying problems. Perhaps there was an earlier change request that has caused some unintentional problems for example.
Another aspect of the documentation side is asset dependencies. When looking for a root cause of, say, a server issue, it's often helpful to understand what that server depends on and what other servers have the same dependencies as potentially there's something going on that affects everything.
Fixing the problem for one server is certainly useful but it's not ideal if you miss the other servers and have to go through the same root cause analysis when the next server has the same issue
With Insight, you can define dependencies in the CMDB and then get a handy graph that quickly shows you all the interconnections so you can dig deeper.
If you discover that an incident or problem ticket requires quite a significant change to fix, you may want to create a change ticket in Jira. With Insight post functions you can create a Jira workflow which will take your attached objects in your incident or problem ticket and transfer them to a brand new change ticket and even assign them automatically to the asset's owner, reducing admin and speeding up your process.
Proactive problem management
What we discussed above is great for when you know you have a problem and want to understand the root cause. But what about when you don't know you have a problem? In addition to the CMDB and post functions, Insight contains reporting and automation tools that can be useful for proactive problem management.
We've already seen that Jira and Insight will build a comprehensive history of your assets but we've made sure that Insight can display this, and other useful information, in an easy-to-consume way. So if you have a hunch there's a deeper problem, or are doing a scheduled review of your critical services, you can quickly scan the asset's history for any red flags.
When you click on an object you get the view below. This view is very customizable; you can choose what information is shown so you get the information relevant to you. Below we can see both the connected tickets and the object's history in separate tabs as an example.
Insight has a wide range of reporting capabilities which can also help you identify problems. In particular, the Issue Count report type. This will show you the number of issues based on some filter criteria such as the type of object or the issue type.
You could create one of these reports that, for example, shows the number of issues for all servers or assets you've labelled as 'critical’ over a rolling time period. Then, if you see an unusually high number of tickets, you can do some deeper investigation into those tickets and potential root causes.
Another example would be a report showing all assets that had a status of 'offline' in a particular time period to see if there's something that repeatedly goes down. An asset's status attribute, if you chose to create one, can be set manually or by a Jira post function if the IT team received a ticket and confirmed that the asset is indeed down.
Reporting can be a very useful tool for spotting long-term inconsistencies or variations. What you can report on depends very much on what information you've decided to store in the Insight CMDB.
One final aspect of Insight that we'll discuss here that can help with proactive problem management is the automation rules. These can be set to be triggered by certain changes
So for example, you could have a rule that says if an object has more than 5 linked incident issues in a set timeframe, send an email to the object's owner to ask them to investigate. Or if a certain criteria is met, such as multiple server incidents all with the same dependency occur (e.g. same OS), then create a Jira Problem to investigate the problem further.
These automation rules are very customisable so they can be used for what is important to your system and might indicate a greater problem.
So there you have it. That's how your CMDB can help you solve the problems (pun intended) with problem management. With good information, a CMDB is going to help you understand your dependencies better, make digging into those root causes significantly easier, and give you ideas on where to proactively hunt for problems. All of which in turn will ultimately help you save time and money.
If you're interested to learn more about how CMDBs within Jira can help with other aspects of ITSM, Take a look how they can overcome common challenges with your change management process.
Originally published Jul 7, 2020 11:01:39 AM