Skip to end of metadata
Go to start of metadata

Operate Integrated Observatory Network

Manage the operations of a cyber observatory (the Integrated Observatory Network) and its assets.

Summary

Monitor the operations of a cyber observatory (the Integrated Observatory) and its assets. Manage resources, including users, organizations, processes, data, and policies.

Review Status Ready for OOI Review
AS Priority 5
AS Version 3.1.1
Issues Status (Jira) OverviewAllUnresolved
Custom Issues Lists Marine IO ReviewMarine IO ProcessesCI IO Verify

The custom issue lists are as follows. They include both open tasks, and tasks marked as fixed.

  • Marine IO Review issues are called to the attention of the Marine IOs for their review.
  • Marine IO Processes issues are expected to require further consideration/understanding of the Marine IO processes.
  • CI IO Verify issues are generally resolved, but the resolution needs to be confirmed with appropriate CI experts.

Outline

Related Use Cases

Use Cases Mapped to This Scenario

The following Use Cases have been mapped to this Acceptance Test Scenario:

Use Cases Cited by This Scenario

This Acceptance Test Scenario cites the following Use Cases:

Key

This text style = background material
This text style = priority <3> (not required).

( ) Indicates footnoted material targeted for Release 3.
( ) Indicates footnoted material targeted for Release 4.
[MI] , [Ops] Provided by MI or Ops team (has no use case).
[NoUC] indicates material for which no Use Case exists.

Overview Diagram

Click on the thumbnail image to pop-up a full-width image, or see image on its own page.

Roles:

Observatory Manager: Manages a marine or virtual observatory.

Observatory Operator: Operates a marine or virtual observatory.

Platform operator (Marine Asset Operator): Operates a platform in the OOI.

Integrated Observatory Manager: This person manages the entire Ocean Observatories Initiative Integrated Observatory Network operations team and coordinates activities and responsibilities among the team members. This person is ultimately responsible for support and maintenance of the Integrated Observatory Network.

Integrated Observatory Operator: This person is responsible for operating the Integrated Observatory Network on a day-to-day basis. An Integrated Observatory Operator interacts with the Integrated Observatory system to monitor usage and status and interacts with end-users of the Integrated Observatory system when they encounter issues.

End-to-End Scenario

Relevant Use Cases
Material from relevant source use cases is presented in a box with prefixes like this. Each reference can be expanded.
CG.OMC.nn — OMC use case
CG.WS-SC.n.n.n — CGSN OMC workshop notes (scenario number)
CI.UC.nn — CI Release 2 Use Case
CI.CO.n.n — CI Instrument Life Cycle Operational Concept, Version 1-00, 2115-00001, 10/28/2008

Log in as operator

Relevant Use Cases
UC.R2.46 --- Operate Integrated Observatory Network
UC.R2.57 --- Configure Start Page
An Integrated Observatory Operator, Jan Haven, arrives to work at 8:00 am and sits down at her computer in her office. She and her coworker Dave are responsible for the daily monitoring of the state of the Integrated Observatory Network, troubleshooting and fixing any issues within their area of responsibility, participating in any scheduled maintenance activities, communicating any issues with the rest of the team, making any changes/updates to the system configurations, and interacting with Integrated Observatory users as they address issues that come up with the Integrated Observatory. They have been working on the project for more than a year and were hired before the commissioning of the system and helped bring the system online and to its current steady state. The basic principle upon which Jan and Dave determine their area of responsibility is that the marine operations team is the starting point for all marine observatory operations issues. Issues that Dave and Jan address have one or more of the following characteristics: they are common to all the OOI observatories, they require too much knowledge about the Integrated Observatory software for the marine operators to readily handle, or they represent issues with Integrated Observatory-supplied resources and support services. On occasion Jan and Dave answer questions about system operations on behalf of the marine operators, but Jan and Dave are scrupulous about not touching marine operations without close engagement by the relevant marine operators.

Jan accesses the Integrated Observatory system via its web address and logs in using her credentials from her own institution. (If her institution is not a member of the OOI's credential-issuing network, she would have obtained a credential through Google or another credential provider.) She is automatically given access to the Integrated Observatory system in the Integrated Observatory Operator role, a role previously assigned to her. This role lets Jan access the appropriate operator dashboards that allow her to manage users, user roles, policies, elastic processing units (EPUs) and their definitions, agents of all types, services and their definitions, message brokers, and controlled vocabularies.

The system identifies Jan's customized dashboard configuration (i.e., her landing page) and restores it. (See UC.R2.57 Configure Start Page.)

Future Release Notes
Release 3

Monitor routine operations

Relevant Use Cases
UC.R2.40 --- Monitor ION Resources
UC.R2.46 --- Operate Integrated Observatory Network

As part of her daily job as an Integrated Observatory Operator, Jan spends part of her time reviewing infrastructure resource utilization records for the OOI networks. The resources she works with are primarily computational: disk space, CPU load, throughput on communication networks (including satellite transmissions), and user levels. For many of these resources, Jan uses external monitoring software to augment the Integrated Observatory's views. (See UC.R2.46 Operate Integrated Observatory Network.) ( 1 )

OOI provides a fixed amount of computational resources and storage resources for permanent archival, depending on providers in the commercial world to augment these resources for dynamic needs. ( 2 )

Jan sees a large number of resource event information messages, a smaller number of resource warnings, and a few resource alarms on a week-to-week basis. ( 3 ) (See UC.R2.40 Monitor ION Resources.)

Today Jan noticed that disk space on ooicollab is getting quite short. [Ops] ( 1 ) This is a disk farm where most of the OOI virtual laboratories are located.

This shortage was noticed while the disk space was only 80% full. There was still plenty of room for other activities, and this is one of many resources, not the only one available. Only Dr. Chu's team and some other collaborators would have been affected by the shortage had it been allowed to go on indefinitely, and he would have reached his own storage quotas before then anyway.

Jan uses the contact information associated with the resource provider to instant message the members of the resource provider's team, and walks through the problem with them. They agree to adjust their resource allocations to solve the problem with the urgency that Jan emphasizes. ( 4 )

( 5 ) Jan notes later in the day there is now more disk space available for science virtual laboratories, and cancels her request for updates.

As Jan appreciates, the disk quota would have been filled later that day when Dr. Chu's team starts interacting on software, sharing past publications and images, and starts running software that publishes notifications and summaries to files within the laboratory space. Because she anticipated the problem, Dr. Chu's team does not have any issue when Dr. Chu begins running the POIM software. In the above scenario, Dr. Chu's overuse of resources could not have impacted core Integrated Observatory operations, because those resources he used were running on other systems. Therefore there is no need to review the algorithms created and run by an end user.

( 6 )

Future Release Notes
Release 3
Release 4

Respond to request(s) and problem(s)

Relevant Use Cases
UC.R2.39 --- Manage ION Users
UC.R2.40 --- Monitor ION Resources
UC.R2.46 --- Operate Integrated Observatory Network
UC.R2.49 --- Deploy Distributed Processes
UC.R2.52 --- Manage ION Processes
UC.R2.55 --- Manage Help Ticket

On the other hand, Jan's colleague Dave Hipman — another Integrated Observatory Operator — has noticed a problem. Because the OOI system is highly distributed, it must also be highly networked. The operations team is able to take advantage of this networking by providing support from different physical locations (anywhere with a high-speed connection). This extends the support day for the system, as well as the pool of available applicants for these positions. Wherever they work, all of the Integrated Observatory Operators carry wireless networked smartphones. ( 1 )

This afternoon, Dave noticed a sudden increase in network and mail resource usage. He is responsible for monitoring one of the key publication and notification hubs for the CI, and has monitoring interfaces much like Jan's that constantly update data on system activities relating to publication and notification, displaying basic statistics like number of active subscriptions and updates sent in a recent period. These are part of a set of core Integrated Observatory-specific application metrics (user logins and time online, service calls, and the like) that can be measured and summarized only from within the system. (See UC.R2.46 Operate Integrated Observatory Network.)

Dave opens a Jira trouble ticket for this problem, going directly to the Jira system to log the ticket. This lets him log all his diagnostic actions, as they take place, against an open ticket, for others to track as they need. (See UC.R2.55 Manage Help Ticket.)

As he is checking the status of the management applications system, which provides application services like mail and Confluence, he notices two issues: the mail server queues are nearly redlined, and the network load spikes are much higher in that subnet than is usually the case.

To begin diagnosing the problem, he opens the log monitoring windows for the mail server on his desktop. Although system updates of the mail log are a little slow, he can see that every 5 minutes or so, a lot of messages go to the user "Chosen Destinations." He recognizes this is not an actual email destination, but is a category of other destinations in the drop down menu. Fortunately, this is a "dead letter" drop, and the email goes nowhere, but surely this was not the intention. He makes a mental note to ask the designers to prevent this option from being selected in the future. (See UC.R2.40 Monitor ION Resources.)

Dave analyzes the emails to find the originator — as usual, a process running in the system has been generating the emails. He uses the originating process unique ID in the email to look up the metadata for the process, and identifies its owner, a Dr. Chu. (See UC.R2.52 Manage ION Processes-Deprecated.) He uses the provided contact information to call Dr. Chu. If necessary in an emergency, he could suspend processing of the output streams, but sometimes this can have unexpected consequences. Fortunately, he reaches Dr. Chu, who agrees to terminate his process and diagnose the situation. Dave offers the useful clue that he's seeing what appears to be the same packet multiple every 5 minutes or so.

Dave also appends the Process Unique Identifier as a field in the Jira ticket that he opened on this issue.

Dr. Chu quickly realizes that there were several problems with the Matlab program he and his team wrote: it was publishing the summaries every cycle instead of every 100th cycle, and it was apparently publishing it 100 times. Those problems were fixed by moving a constant in the code. Fortunately, because the team member who registered/enabled the software selected the wrong email recipient (and one that couldn't receive emails, at that), no one was flooded by excess copies of the reports. ( 2 ) Dr. Chu easily restarted the corrected software, and soon was receiving the data he wanted. Dr. Chu is briefly stymied when he wants to log in, not realizing he has logged in using a different user registration service than usual (Google, rather than his typical university account). Because the system allows him to associate this identification with the one he previously used — using Dr. Chu's known email to validate the two are one and the same — Dr. Chu can quickly use the system 'as himself', using the new user registration service. (See UC.R2.39 Manage ION Users.) If Dave could not reach Dr. Chu, there were other contacts on the team he could try to reach for input. If he could not reach the others and had to take action himself, the actions he took would have been communicated to Dr. Chu through email and optionally other paths, and could be reversed to allow the software to run again, once the problem was fixed.

In addition to their monitoring duties, the Integrated Observatory operations team provides help support to the users of the OOI systems. All of the portals and software in OOI provide references to help sites and contact information. ( 3 ) [NoUC]

Dr. Chu used these services to help him adjust his software to be visible only to his own team. He sent his request to the support email address visible in his Integrated Observatory web interface. This was a good choice, as the Operations team is organized to provide the most efficient turnaround to email requests, usually addressing them within an hour for uncomplicated requests. In this case, the email was routed to the queue that all of the operations team query support staff access, and it happened that Dave was on rotation and picked up that message. [Ops]

Seeing that Dr. Chu was still on-line, Dave had several options for contact, including return email and instant messaging (as well as mobile phone alerts, for members who provide that contact information). He chose to send a very brief email (which includes the Jira issue number), copying the Jira ticket support system's email, and then telephones, as he has found that is often the fastest way to make sure the question is answered effectively. Although emails can provide reference information quickly, the requester doesn't always see an email for quite some time after it arrives. In a short conversation, Dave explained the process to make data private to the laboratory, made sure Dr. Chu understood the advantages of making it public as soon as possible, and confirmed that the email had arrived.

In future correspondence on this topic, Dr. Chu could reply to Dave's email (which would go to Dave, with a copy to the Jira issue system's email service) or send a new message to the support team. If he uses reply, or just includes the previous issue ID (which is naturally just another resource known to OOI) in the email, the history of the communication can be obtained by calling up the issue ID from Jira. ( 4 ) As might be apparent, the email communications and support notes become just another flow of information, with connections to the relevant Integrated Observatory resources and OOI member IDs. By entering Dr. Chu's unique OOI identifier in another field of the Jira ticket, Dave supports the association of this ticket to Dr. Chu's history with the system. (See UC.R2.55 Manage Help Ticket)

Using the Jira system, Dave indicates the problem was resolved, and adds a comment with the final details. Whether the data source is an instrument or a process like Dr. Chu's software, this documentation approach makes it possible to discover and view all the knowledge associated with a data stream, and to correlate actions taken — for example, to shut off an instrument that is spewing out bad data — with support discussions, other system and scientific events, and with reconfigurations and other changes to the data source. (See UC.R2.55 Manage Help Ticket.)

The ticket in this scenario was visible to Dr. Chu, because he was a member of the OOI project, not just a person with an account on the Integrated Observatory. ( 5 ) ( 1 )

Dave has one more request to respond to today. Some of the System Process Developers have created a set of processes that analyze high frequency, high volume data from some new marine assets. They essentially have the processes ready to go, and the only task left is to deploy the processes on appropriate computational resources.

Dave reviews the computational environments available for the deployment. He determines that these processes are resource-intensive and may impact operations on the currently available environments inside the OOI-procured computing assets, so he selects a commercial site and availability zone which are unlikely to impact any operating OOI systems. (Once he has more experience with these processes, he will feel more comfortable co-deploying them with the more critical Integrated Observatory software.) Dave confirms his deployment is properly specified, and deploys the processes operationally. (See UC.R2.49 Deploy Distributed Processes.)

Future Release Notes
Release 3
Release 4

Perform scheduled activities

(The marine-centric use cases have examples of execution of scheduled activities)

Certain Integrated Observatory operations have to be performed regularly, for example nightly processes to perform maintenance on system processes. The Integrated Observatory provides a method to kick off scheduled activities at a given time or times each day. These must be set up by System Process Developers in advance, rather than being scheduled on the fly by Observatory Operators. [NoUC]

It is also possible to access the system's services through remote interfaces (via a service gateway), allowing services to be accessed using external scripting languages. (Note that this access mode requires VPN access or some similar security mechanism to ensure only qualified users may use it.) [Ops]

( 1 )

Release 3

Log out as operator

Relevant Use Cases
UC.R2.55 --- Manage Help Ticket
UC.R2.60 --- Troubleshoot Deployed Instrument

At the end of her day, Jan logs out of her Integrated Observatory dashboard and heads home. Dave likewise departs some hours later.

The items that are in progress on this particular day are carried over until the team's return. If there are items that are not done and need to be worked on past one person's departure, they will notify the next person about the remaining work, using the Jira task ID as a primary means of conveying the status of work. (See UC.R2.55 Manage Help Ticket.)

If a trouble ticket comes in after Jan and Dave have gone home for the day, it may wait until they return for the next day; Jan and Dave work regular work hours. It is possible to indicate that the ticket is urgent, or to call the hot line provided for OOI support; either will contact the on-call support person. (This is also the case for any auto-detected system failures reported by the various monitoring applications — an urgent message results and the support person is contacted. See UC.R2.55 Manage Help Ticket.) [Ops]

Labels

r2-acceptancescenariodetail r2-acceptancescenariodetail Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.