ITS Unplanned Outages (Major Incident Management)

The ITS Unplanned Outage and Escalation Process is a process for communication and teamwork during an unplanned outage. The process is triggered by any degradation or interruption of service, and is considered complete when normal service is restored.

  • Since unplanned outages require quick attention, all that is initially required is an immediate notification to sc.update@ucsc.edu reporting the outage. Once the unplanned outage is resolved and service restored, a more detailed follow up is required.
  • All unplanned outages are considered incidents, and an incident ticket in IT Request will record the impact and status update of the outage.
  • All subsequent communication should occur within the IT Request ticket, including outage status updates and resolution once service has been restored. Make sure that sc.update is always included in watch list.
  • If an unplanned outage affects an essential or business critical service (Network, Email, Calendar, Wireless, Cell, FIS, AIS, WWW, Telephone, Voicemail, eCommons, Shibboleth, IDM-LDAP, etc.) or high impact system (AFS, VM Clusters, etc.) that affects a division or the campus, and is unavailable and/or otherwise disrupts campus business processes for more than 2 hours, the outage is considered a Major Incident. The ITS Doc might be activated if operational resources or coordination efforts are required.

Preparing a Major Incident Ticket

The Support Center Manager will open a “parent” incident and provide the ticket number number to sc.update. All tickets associated with the Major Incident (“child” tickets) should be related to the parent Incident.

Relating Incidents:

  1. To relate each child incident ticket, enter the parent incident number (e.g. INCxxxxxxx) into the “Parent” field and hit submit
  2. All related incidents to the parent will be captured under the Related Links in the parent incident

Resolving Incidents:

  1. Once service is restored, change “Incident State” to “Resolved” in the parent ticket.
  2. In the “Comments” field, provide the resolution in detail. Please note: This final comment in the parent incident is customer-facing, and will be communicated to affected clients in child incidents. The “Incident State” of all child tickets will be set to “Resolved.”

Unplanned Outage (Major Incident Management) Roles and Responsibilities

Change Manager:

  • Responsible for approving any emergency changes related to the unplanned outage.
  • Updates incident ticket and sc.update if outage becomes a Major Incident (greater than 2 hours and impacting major service).
  • Designates an Incident Coordinator (frequently the lead technician or their manager) if outage becomes a Major Incident. Shares this responsibility with Communication Manager.
  • Informs sc.update of the person designated as Incident Coordinator.
  • Be prepared to serve on a Service Resolution Team (aka Mini-ITS DOC).

Support Center Manager:

  • Verify outage, immediately update sc.update, and place on Maintenance Calendar.
  • Open outage parent incident ticket. This should be an internal ITS ticket only.
  • Communicates to sc.update ticket number.
  • Relates all incident tickets to parent outage incident ticket.
  • Responsible for reporting number of related incidents to sc.update as needed.
  • Be prepared to be on a Service Resolution Team (aka Mini-ITS DOC).

Communications Manager:

  • Responsible for opening a Google Hangouts or Zoom Meeting with the Change Manager and Support Center Manager.
  • Identifies main tech lead as point of contact for updates.
  • Designates an Incident Coordinator (frequently the lead technician or their manager) if outage becomes a Major Incident. Shares this responsibility with Change Manager.
  • ITS DOC Communication Lead.
  • Responsible for ITS/customer/campus communications, updates to ITS website.
  • Be prepared to be on a Service Resolution Team (aka Mini-ITS DOC).

Technical Lead/Incident Coordinator:

  • Communicates updates to sc.update every two hours, or as needed.

For a hard copy of this process: Unplanned Outage Process