CORE-A3 - Availability Management

July 11, 2023

The Availability Management process ensures services are available for consumption and are up and available under the conditions of established service level agreements. Availability Management encompasses services that define, plan, evaluate and improve all aspects of the availability of IT services, establishing and maintaining them in compliance with agreed to availability targets. These aspects include the people, process and technology factors of availability.

Availability of critical systems has a direct impact on the health of the organization. It can impact Revenue, Profits, and Customer Satisfaction. In order to have a strong availability management system, a process needs to be put in place to understand Customer and Business requirements and to ensure that the IT systems are capable of meeting these requirements. 

The key indicators on which the Availability Management process rests are:


Availability: the time that the service functioned correctly expressed as a percentage the total time it has been agreed that the IT services are to be accessible to users.


Reliability: measure of the time the services functioned correctly without interruptions.


Maintainability: capacity to maintain the service operational and recover it in the event of an interruption.


Service Capacity: determines the availability of the internal and external services contracted and their appropriateness for the OLAs and UCs in force. When an IT service as a whole is subcontracted, the terms availability and service capacity are equivalent.


  • Define and maintain the Availability Plan for the team
  • Tailor and maintain the Major Incident Plan for the team
  • Manage escalation and problem notification process
  • Provide Trend analysis and develop action plans to resolve Critical Business Impact outages
  • Coordinate Critical Business Impact outages problem resolution activities
  • Own and administer the Root Cause Analysis (RCA) process and meetings
  • Critical Situation Management (business recovery management)
  • Report on changes that have the potential to impact Mission Critical Business functions
  • Risk Management for Mission Critical Business Functions.  
  • Communicate service interruptions, root cause analysis, trend analysis and actions plan for Mission Critical Business functions
  • Lead situation management projects
  • Provide Executive Alerts


The Objective of the process:

The fundamental objective of Availability Management is to ensure that all the IT services are available and are functioning correctly whenever customers and users want to make use of them in the framework of the SLAs in force. This is accomplished through the reduction of the number of Incidents, reduction in the number of Sev 1s and 2s, reduction the number of failed changes, and a robust program to analyze the root cause and identify areas of improvement to reduce recurring problems.


Sample list of benefits:


  • Ensuring service availability meets SLAs
  • Determining the cause of availability failures
  • Reviewing business requirements for availability of business systems
  • Cataloguing business requirements
  • Ensuring proper contingency plans are in place and tested
  • Establishing high-availability, redundant systems to support mission-critical applications
  • Execution of the process will result in improved availability
  • Response to events impacting availability will be improved.
  • Resolution times for events impacting availability will be improvement.
  • All events that impact availability will be tracked and trended to avoid re-occurring problems
  • Proactive actions can be taken to improve availability as a result of the analysis
  • The Root Cause Analysis will be improved and yield more clearer information
  • A risk management process will be established as a result of the analysis, notifying management of potential risks in the environment.


Sample list of observations:

  • A continuous improvement is needed to improve the level of service and business function availability of critical systems.
  • Accelerate Learning from recent outages to avoid recurrence and improve the Availability of critical business functions despite failure of technical components.
  • Leverage Best practice standards based on industry guidelines.
  • Identify ways to improve efficiency, customer satisfaction, and competitiveness.
  • Improve governance and integration between internal teams and vendors.


Sample list of recommendations:

  • Develop a consolidated list of availability improvements from various sources. The list should include all improvements that can be made to reduce the risk to availability.
  • Correlate incidents to reduce the redundancy and complexity of disruptions by related incidents.
  • Identify and prioritize functional areas critical to the provision of services to customers. Consider an application fail over techniques that mask technical availability issues.
  • Enhance the current Enterprise architecture team to include a design authority.


Assessment Questions:


  • Do you Define and maintain the Availability Plan for the team?
  • Do you Tailor and maintain the Major Incident Plan for the team
  • Do you Manage escalation and problem notification process?
  • Do you Provide Trend analysis and develop action plans to resolve Critical Business Impact outages?
  • Do you Coordinate Critical Business Impact outages problem resolution activities?
  • Do you Own and administer the Root Cause Analysis (RCA) process and meetings?
  • Do you participate in Critical Situation Management (business recovery management)?
  • Do you conduct Failure Mode Effects Analysis?
  • Do you Report on changes that have the potential to impact Mission Critical Business functions?
  • Do you participate in Risk Management for Mission Critical Business Functions?  
  • Do you Communicate service interruptions, root cause analysis, trend analysis and actions plan for Mission Critical Business functions?
  • Do you Provide Executive Alerts? 
  • Have you conducted a Business Impact Analysis (BIA) to identify Availability requirements?
  • Do you have Availability OLAs or SLAs?
  • Do you Identify and analyze areas of improvement?
  • Do you produce recommendations regarding the elimination of Single Point of Failure?
  • Do you produce recommendations regarding the elimination or minimization of the impact of planned downtime (maintenance activity, RFC implementation, etc.)?
  • Do you produce recommendations regarding fault-tolerant technology?
  • Do you produce recommendations regarding duplexing, full mirroring, redundancy across all aspects of the IT environment (data center, environmental (power supplies, air conditioning), enabling-components, etc.)?
  • Do you produce recommendations regarding improved processes and procedures?
  • Do you produce recommendations regarding Business, data and information security requirements related to confidentiality, and Services, data and hardware that must only be available to authorized personnel?

Signup to read full articles

Ready to listen to what your data is telling you?

Book A Consultation

Subscribe to our Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.