Critical

IT Crisis Management – A Modern Model

Written by Warner Schlais, President, Technology Services, Datatrend Technologies, Inc.

While technology itself has advanced light years in the past few decades, the process used for IT crisis management really hasn’t changed that much for most enterprises. Typically, when a “severity one” problem occurs, all IT disciplines are contacted (network, applications, database, systems, etc.) and they all begin the process of searching for the root cause of the problem, all the time hoping it is not in their domain! Ironically, the first to claim “not me” is often actually the one who needs the call back.

Outside of having a virtual war room vs. a physical war room, not much has changed from a process perspective. Though, the sprawling and fragmented ownership for the distributed IT ecosystem further complicates the matter. I have spoken with customers who had to dramatically expand the capabilities of their conferencing service to include everyone needed on the call, sometimes numbering in the hundreds of participants. This further complicates matters, making the problem-solving task that much harder and adding significant opportunity cost to engage that many resources.

It doesn’t need to be this hard. No longer is this approach a required way of doing business. A new approach called Application Modeling and Service Impact Modeling significantly reduces the problem discovery time and mean time to recovery (MTTR). The underpinnings of this approach were originally developed by BMC in the 90’s. Back then, however, these models were built and maintained manually, often falling into obsolescence. When BMC acquired Tideway Systems in 2009, it gained the application discovery and dependency mapping (ADDM) tool and made it part of the Atrium suite; as a result, the capability to build and maintain these models became affordable and self-maintaining, for the most part.

We’ve found the following set of BMC Software components is instrumental to enable and leverage this approach to full potential:

  • BMC Atrium Application Discovery and Dependency Mapping (ADDM)
  • BMC Atrium Configuration Management Database (CMDB)
  • BMC ProactiveNet Performance Management (BPPM), which is now part of BMC TrueSight Infrastructure Management and BMC TrueSight Event and Impact Management

No longer is an army of resources (usually the best, most knowledgeable people) required to be pulled off their key assignments and dragged into a virtual war room to begin triage on a critical problem. Because a well-developed service impact model visually pinpoints the problem, only a few people are required to take that information and participate in actually fixing the problem.

For many enterprises, the reduced problem-solving headcount alone provides enough savings to justify implementing the modeling capability. Add to that the savings of reduced downtime for the impacted business applications, and a clear financial business case emerges. Beyond the value of reduced headcount and downtime, modeling offers several other benefits, addressed below. First, however, let’s discuss two key types of modeling and how they deliver business value.

Application Modeling is the creation of a “visio like” schematic which shows all of the components of an application, including the ecosystem of infrastructure on which the application runs….or depends. Server dependencies are particularly key in showing the entire scope of an application. And, often, the modeling process reveals a wider server footprint than originally expected. The TPL scripting language that is part of ADDM is used to create these models. Application modeling is a fairly quick task, taking an average of a day to create an application model. Use cases are many. If you are decommissioning or making a major change, or recovering from a disaster, how do you know you have the entire application? Application Modeling gives you that information.
WS-ill1
Service Impact Modeling (SIM) takes this a step further. The old saying “a picture is worth a thousand words” holds true with SIM. Plus, it adds dynamic updating up-to-the minute information in the context of a business process. Instead of getting a message at a console as an event, you can see the problem in a service model, where the problem area lights up and sends notifications to appropriate resources. You save downtime and use fewer resources. By organizing applications into business service models, IT can determine the business impact resulting from the outage. This is huge in problem escalation, problem prioritization, and business alignment regarding issues, investments, improvements, availability services, system performance management, etc. By reducing the downtime of an outage, service levels will improve for the business. There are real dollars here to help a business case for modeling.

WS-ill2

So SIM and Applications Modeling can take crisis management to the next level for IT organizations and add significant value in other areas as well. Strategic asset management, strategic component decision making, software rationalization, and many other areas can benefit from using these tools and processes.

Datatrend is uniquely positioned to help companies looking to head down this path. Large scale modeling engagements can use the Datatrend modeling factory, to get large numbers of models built at a very affordable price. Service Impact Modeling is an emerging value proposition in which Datatrend has deep expertise. Our talent goes back to delivering the manual SIM approach, and is highly qualified to deliver the BMC fully automated approach.

See our two-page overview of the Road to Service Impact Modeling or contact us to schedule a briefing and demo.