Let’s start with the question: what kind of service do the customers expect? Above all, they need a guarantee of smooth running of the business. It is supposed to work and be secure. But what if it doesn’t work as it should? Customer and user satisfaction, as well as the value we represent as a company largely depend on how quickly we solve the obstacles encountered in the life of the product. So, who is responsible for putting out fires? Incident Management, known from the famous ITIL circle, comes to the rescue.
We can find quite a few publications on Incident Management, for example, at atlassian.com. We read there that Incident Management aims at minimizing the negative impact of incidents by fixing the bug and returning to the standard operation as quickly as possible.
An incident is an unplanned break in the delivery of a specific service or a reduction in the quality of that service due to a defect. 
Imagine a bank that provides a mobile application. One day, there is a problem – users can’t generate authorization codes in the application or the application closes in the middle of this activity. That’s an example of an incident.
The incident lifecycle begins when we identify an error in the IT infrastructure or system operation. Each incident (as well as a request for change) should be logged in the ticket handling system. They usually go through the following processes (stages): Incident Management, Problem Management, and Change Management.
All incidents first go through the Incident Management process. If the solution is not yet known and we need to identify the problem, the incident is moved to the next stage: Problem Management. Otherwise, Service Desk provides the user with immediate help, most often in the form of a workaround that eliminates the consequences of the incident.
A workaround is usually a temporary solution that helps to maintain business continuity. For example, we can change the way we use an application. Let’s say a user wants to send an email from Outlook but the mobile version isn’t working. The incident is logged, but we can’t fix it right away. To help the customer achieve the goal of sending an email, we suggest using Outlook in the web version until we resolve the matter.
Of course, a more complex incident goes to the appropriate Resolving Team. In that case, it might take longer to deal with the issue. Either way, the Service Desk is usually the main point of contact (by definition, it should solve the vast majority of incidents).
The graph below shows the incident lifecycle.
Pic.1 – Incident lifecykle
SLA and KPI – what do they mean in the context of incidents?
When we talk about incidents, it is necessary to know terms such as SLA and KPIs.
SLA (Service Level Agreement) is a guarantee of service quality, included in the contract between the customer and the provider. For example, the contract may state that password reset incidents will be resolved within less than 30 minutes from their logging.
KPIs (Key Performance Indicators) are the most important metrics that enable an organization to constantly monitor the progress toward achieving its goals. Even though they are not stated in the contract, KPIs are an important tool for Incident Managers. They use KPIs to check the quality of the service in the team or organization. For example, the indicators can provide information on how many incidents were resolved within a given time frame, let’s say, two months.
ITIL (Information Technology Infrastructure Library), a set of good practices in the field of IT, says that Incident Management activities form two processes:
- incident handling and resolution (from incident identification to its closure),
- regular incident review (checking whether incidents have been handled according to the established procedures and whether the lesson was learned).
I partially explained the first process in the incident’s lifecycle. It includes elements such as:
- Incident identification – by the user or a system for monitoring incidents,
- Incident logging – in the incident database, ticketing tool, and so on,
- Incident classification – giving it the right pace, assigning an appropriate priority and category,
- Incident diagnosis – looking for a solution or consulting other teams in case of more complex incidents,
- Incident resolution – the provider reports that the system is working,
- Incident closure – the user confirms the system is working.
The stages of the other process include:
- Review and analysis of recorded incidents – Incident Managers and service owners analyze mainly high-priority incidents and overdue events. The goal is to find recurring patterns in order to establish new incident models. They perform optimization and automation based on the values they find.
- Initiating improvement of the incident model – registering previously identified improvement initiatives.
- Communicating updates of incident models – providing stakeholders with information about the new procedures.
Incident Managers are responsible for managing all incidents in a given area (customer or organization). They are the primary point of contact for incident handling.
Their most important tasks include:
- coordinating incident handling activities, especially when multiple teams are working on the same case,
- monitoring and reviewing the work of teams that resolve incidents,
- raising awareness about incidents,
- regular review of incidents and implementation of corrective actions,
- participation in meetings related to the service operation,
- reporting the service operation in terms of incident handling,
- responsibility for customer and user escalations.
To sum up, we can say that Incident Managers make sure the service works properly and they react in case of sudden events. Sometimes, when there is an emergency, for example, an incident goes beyond the SLA, or the solution is not simple, an Incident Manager must be very flexible. They need to respond quickly to the dynamic situation, also in the context of the changing service – they should expect the unexpected. What’s more, communication is an extremely important part of this job (and, generally, in the ITIL segment). The ability to communicate effectively is one of the main tools in the hands of an Incident Manager.
Benefits of Incident Management
The implementation of Incident Management in the company is a guarantee of safety. In my opinion, the most important benefits are:
- improved monitoring and performance related to SLA,
- wide information channels related to the quality of services,
- increased efficiency,
- minimizing the risk of unplanned events (incidents),
- increased customer satisfaction,
- reducing customers’ financial losses,
- creating a qualified team, and developing employees’ competencies.
Major Incident Management (MIM)
It’s important to mention that Incident Management processes are directly related to Major Incident Management. It focuses on incidents of the highest priority and business importance. This is a broad and interesting subject, so we will cover it in the next article.
Why do I need Incident Management?
You probably no longer have doubts about the importance of Incident Management. It is vital to plan this process properly and to appoint people to take good care of it. This is just one of the elements ITIL includes among the solutions for effective service management.
Incident Management is often underestimated and overlooked. In my opinion, this is a big mistake. After all, we all know that prevention is better than cure. That is why the role of an Incident Manager as the guardian of the security process is so crucial. Detecting incidents early and managing them appropriately can directly lower the maintenance cost of the service – it is a fact. Negligence can often be a lot more costly than running this process (and others, too).
Craftware and our team of excellent specialists guarantee that these processes can be created, optimized, maintained, and constantly improved. We have extensive experience in managing processes as it is a part of our everyday work. Our satisfied customers prove that we know how to do it well.
- Junior Service Manager
He has worked in the IT industry for 3 years, mainly as an Incident Manager. He joined Craftware in February 2022. Currently, he provides services for a market-leading customer from the medical industry. He is interested in management and live music – he likes going to festivals. He highly values effective communication.