Thursday, June 14, 2012

ITSM - Defining the Service

This is the second part of my series about my client's ITSM project. Previously, we discussed the challenges associated with chasing shiny objects, and believing that the implementation of a product would solve the client's problems. All too often, ITSM projects fail, because the organization fails to differentiate between a Business-Critical Service and an application.

The first step of the project-plan is defining the Service. This means identifying all of the components which participate in the service. By the ITIL definition, a Service is comprised of People, Processes & Technology. Most IT Professionals will look at the Technology components first. Storage, SAN, inter-networking, and server hardware. Later, the applications which make up the Business-Critical application are considered. There can certainly be many applications involved - even indirectly. Consider how important DNS is to most IT applications.

An outstanding deliverable is the information ABOUT the service - it's owner, it's provider, it's consumers and such. These are key to a later Deliverable, when it's time to start identifying the Service Level Objective/Agreement (SLO/SLA). It is important to start defining these early, even if they may change later. Fortunately, a lot of this information is readily available, or can be discovered during the first phase, which is information gathering.

The key difference between an SLO and an SLA is the concept of the contract. An SLO is an objective the Service Provider will TRY to achieve, but there are no penalties for NOT doing so. In the case of an SLA, there are negative consequences for not achieving the agreed-upon objective.

Once all of the components of the Business-Critical Service are identified and catalogued, the next step should involve classifying the types of logging available. Typically, IT Infrastructure folks will monitor the servers and the applications under their care. They will select "best of breed" solutions which can collect LOTS of different metrics. These generally fall into two categories: Alerts and Data-Points.

Alerts are used to let the Operational Teams know if something is going wrong. These will get triggered if a specific server application ends abnormally, or if a threshold (such as CPU Cycles used) is exceeded. These situations can have disastrous effects on the Business-Critical Application, so the Operations Team needs to act swiftly to remediate them.

Data-Points are pieces of information, collected over time. During the collection, there is no requirement for intervention or action on the part of the Operations Staff. This information collects data ove time, allowing for trending analysis. These can be used or exercises such as capacity planning or show back/chargeback models.

Form a purely technology viewpoint, the tools available today can monitor pretty much anything you can imagine. It is tempting for Management and Executive types to suggest that they want to monitor and alert on everything. The more information, the better, right ? Wrong ! Remember that all of the information that is collected needs to be stored somewhere. The more points of monitoring, the larger the data-store, the more information that needs to be correlated. This effectively slows the system down, unnecessarily.

During the interview phase, it is critically important to capture the Stakeholder's objectives. These could be:

  1. I want to shorten the time to resolve trouble tickets
  2. I want to understand how much it costs for a Business Unit to use the Service
  3. I want to increase the visibility of IT Operations to the Business
  4. I want to be able to demonstrate the availability of the Service
Collecting information that doesn't satisfy one or more of these objectives is of little or no value. So in order to keep the data-store manageable, the points of monitoring & alerting should be kept to the confines of these objectives. If the objectives change later, you can always add to the points of monitoring.

 

 

No comments: