Skip to end of metadata
Go to start of metadata

Overview of "Monitor ION Network" Use Case

Respond instantly to network issues affecting system


Tip: Key Points
UC Priority= 4 or 5: Critical, is in R2
Only boldface steps are required
<#> before a step —> lower priority
(optional) —> run-time option

Related Jira Issues:   Open   •   All

Metadata

Refer to the Product Description and Product Description Release 2 pages for metadata definitions.

Actors Integrated Observatory Operator
Observatory Operator
References UC.R2.46 Operate Integrated Observatory Network
UC.R1.29 Monitor System
Uses UC.R2.55 Manage Help Ticket
Is Used By  
Extends  
Is Extended By  
In Acceptance Scenarios None
Technical Notes This use case focuses on the network part of the Integrated Observatory, not the entire system.
Lead Team OPS
Primary Service Resource Management Services
Version 2.3
UC Priority 4
UC Status Mapped + Ready
UX Exposure ONC, MNC

Summary

This information summarizes the Use Case functionality.

Network monitoring software is operated by the Operations team (the Integrated Observatory Operators, and monitored by network and observatory monitoring software operating within the Integrated or Marine Observatories. A network failure is detected through one or more of those paths. Regardless of the source, any automated detection of a network outage should immediately trigger the following actions: (a) An automated annunciation of the detection to the Integrated Observatory and Marine Operator displays, and possibly the helpdesk ticketing system; (b) automated notification of all Operations personnel on duty; (c) automated update of Observatory web pages and Operations displays; and (d) where feasible and safe, automated steps to recover from the network failure. Resolution of the issues is reported as described in the UC.R2.55 Manage Help Ticket use case.

Assumptions

  1. The same monitoring tools will be used for San Diego networks and the WAN
  2. The CI team is not assuming responsibility for Marine IOs networks, except where the networks connect.
  3. Marine IOs will notify Integrated Observatory Operators if there is a an outage on one of the marine observatory segments.
  4. The WAN provider will provide advanced notification of any planned outages, and will notify OOI of any unplanned outages as soon as they occur.
  5. CI tools can automatically generate JIRA tickets (possibly through sending email).
  6. CI system operators will need to configure monitoring tools for appropriate frequency of alert notifications.
  7. San Diego will have a switch/router in place to interface to the Integrated Observatory network.
  8. The Integrated Observatory will have multiple systems to monitor the network. All of these systems automate the reporting of problems. (See list in Comments.)

Initial State

ION Network is operational.

Scenario for "Monitor ION Network" Use Case

  1. When problems on the network occur (layer 1-4) a message is sent from the detecting monitoring systems to the Integrated Observatory and/or Help Desk systems.
    1. Likely to be an email in most cases.
    2. Help desk software can more naturally process email, and possibly can notify the Integrated Observatory.
    3. Operations will determine if duplicate alerts are covering the same issue.
  2. JIRA system notifies the Help Desk operations team of the problem.
    1. Operations will notify the Marine Observatory Operators of any network outages.
    2. Marine Observatory Operators will be defined as watchers on any appropriate JIRA tickets.
  3. <3> Integrated Observatory displays for Integrated and Marine Observatory Operators are automatically updated.
    1. Must decide whether displays are updated directly from initial error detection, or from Help Desk; a likely scenario (because it may be simple to implement) is a display of network-component-related Jira tickets contains a new open ticket of highest urgency.
    2. For more detailed review, operators will look at dedicated application outputs.
    3. Intermapper should be used for the majority of staff who need visibility into network status. This includes the Marine Observatory Operators.
    4. Developer access to the Solarwinds also makes sense.
  4. <3> Outages are automatically reflected on publicly visible status web pages.
    1. A single page should consolidate the most important status reports.
    2. In the event no Integrated Observatory system can present a status page, an external system should be able to put up a fail page on behalf of ION.
  5. Automated network recovery occurs per the design of the network.
    1. The network recovery can take many forms: channel bonding (at layer 2), route redirection (at layer 3), DNS resiliency (hidden primary with many secondaries), DHCP resiliency (failover configured), A10 resiliency (multiple devices), also RSA, TACACS+, and switch resiliency (cross-connected hypervisors).
    2. When systems auto-recover, updates to Integrated and Marine Observatory Operators, Jira, and public status pages are desirable, but may not be possible in Release 2.

Scenario Graphic

Final State

Displays and web pages accurately reflect network status during and after any outage.

Comments

These comments provide additional context (usually quite technical) for editors of the use case.

Operations must discuss scenarios, management systems, and network interfaces with CGSN and RSN.

An assumption was that "Both audio and visual signals will alert the Operations team to problems." It is not clear how the Release 2 system will present an audio signal, so this assumption has been removed.

The CI team is using the following automated monitoring systems:

  1. Intermapper (snmp alerts and high level holistic view)
  2. Solarwinds (detailed network performance and hypervisor/vm performance tool, application performance and monitoring and reporting)
  3. NetMRI (configuration backup / version control, configuration management)
  4. Traffic Sentinel (sflow collection and flow reporting)
  5. Lancope (netflow collection and application performance monitoring and security)
  6. Statseeker (generates detailed network interface statistics)
  7. DSView3 with power manager (provides power levels/use and cycling
    1. APC UPS (provides UPS issues)

(click on # to go to R2 use case)
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61     27B

Labels

r2-usecase r2-usecase Delete
usecase usecase Delete
productdescription productdescription Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.