Unplanned Outage & Escalation Process
The ITS Unplanned Outage and Escalation Process is a process for communication and teamwork during an unplanned outage. When things go from bad to worse.
- Since unplanned outages require quick attention, all that is initially required is an immediate notification to sc.update reporting the outage. Once the unplanned outage is resolved and service restored, a more detailed follow up is required.
- All unplanned outages are considered incidents and an incident ticket in IT Request will record the impact and status update of the outage.
- All subsequent communication should occur within the IT Request ticket, including outage status updates and resolution once service has been restored. Make sure that sc.update is always included in watch list.
WHAT TO DO
For Unplanned Outages submit the information via IT Request OR Email.
IT Request: Login to IT Request and under Self-service left navigation area, click on "Report an Unplanned Outage." Fill out the outage form. Instructions are available at: Unplanned Outage Training
This will create an Incident ticket and include firstname.lastname@example.org and any technicians listed as working on the issue in the Watch List. The process will also create a Change Request for calendar purposes. The Change Request can be used to document any necessary changes required to restore the service.
When the outage is over, resolve the incident and close the change by doing the following:
- Select the most appropriate Urgency, Impact, and Scope on the Change Request tab
- Hit the "Work Completed" button
- On PIR tab select "Change Withdrawn"
- In Lessons Learned write, "This was an unplanned outage used for the maintenance calendar update. No changes required."
- Hit "Close Change" button
** OR **
Email: email@example.com Please include the following information in your email:
- Who: Technician (s) performing the work.
- What: Summary of outage: Include client facing service name and brief description of outage or change.
- Why: Reason for outage or change. Why is this outage happening?
- When: Date, time and duration of proposed outage or change implementation window.
- Clients: If known, who and how many will be impacted.
- Risk elements: Testing results, training required, time to perform work, back-out/recovery plan, impact if change not performed.
- Communication: What communication steps have you taken if any? Do you need assistance with communication?
Emailing sc.update notifies the ITS Support Center, change manager, communication manager, and other ITS staff.
Once the outage is reported to sc.update during regular business hours, the ITS Support Center will create an outage ticket to collect all further updates. If the outage occurs after regular business hours (5PM-8AM) or on weekends, the reporter will create the ITR ticket using the above procedure. This is recommended for all outages and mandatory for a Service Disruption.
If an unplanned outage is on an essential or necessary service (Network, Email, Calendar, Wireless, Cell, FIS, AIS, WWW, Phones, eCommons, Shibboleth, IDM-LDAP, etc.) or high impact system (AFS, VM Clusters, etc.) that affects a division or the campus, and is down and unavailable and/or otherwise disrupts campus business processes for more than 2 hours, the outage is considered a Service Disruption and the following needs to occur:
- Designate an Incident Coordinator (frequently the lead technician or their manager).
- Reply back to the ITR ticket email or update the ticket directly in IT Request with the outage status and note that it is now an official Service Disruption.
- Be prepared to be on a Service Resolution Team (aka Mini-ITS DOC).
- Know the outage escalation responsibilities associated with your ITS role.
Between 6:30AM – 8PM (any day, including weekends) contact your director/manager and/or one of the people below for help with ITS wide communication: Doug Hartline, Janine Roeth, Lisa Bono, Peter McMillan, or Andrea Hesse. DCO Operators (459-2714) have escalation phone numbers if needed.
If an unplanned outage is an essential campus service or system that is down and unavailable for more than 5 days, or an outage of a system that prevents instruction (when classes are in session) for more than 24 hours, the outage is considered an IT Disaster. The ITS Divisional Operations Center (ITS DOC) will be activated.
For a hard copy of this process: ITS Unplanned Outage Process (PDF)