|This page describes the plan for a subsystem iteration including the high level goals for the iteration. It should also contain dependencies on other subsystems, plans for integrating functionality and a list of iteration tasks or a link to the subsystem google doc containing the iteration task details. This page should also contain references to the associated architecture pages, construction plan pages, and use cases.|
The highest level goal is to provide a complete end-to-end system launch on the San Diego Nimbus hardware. Work closely with the integration team and provide any fixes and features that are needed. At the same time, continue to work towards the complete R2 feature set.
- Support the system launch integration effort with CEI fixes and features as needed.
- Complete integration of the Process Management layer with Pyon. Switch the launch to use the Pyon components.
- Port the HA Agent to Pyon as well, and add simple N-preserving HA Agents to the launch.
- Add preliminary support for multiple Execution Engine domains in the launch
- Enhance the Process Dispatcher scheduling API to support the system deployment and restart needs
- Prototype pulling sensor data into an HA Agent – advance work for autoscale
- Prototype pulling code packages into a running container – advance work for process package repository
- Help evaluate and install central logging solution
- Port base image to EC2 – CEI will advise and assist as needed
- Dynamic DNS or some other solution to facilitate Apache communication with web gateways – CEI will implement whatever is needed in recipes.
- Provide a Certificate Authority and generate a keypair for containers to use
- Provide builder for coi-services that produces prebuilt archives
- Provide r2deploy.yml release file annotated with service dependencies
Week of July 2:
- Integration tests for Pyon-based process dispatcher
- Integration tests for Pyon-based HA Agent
Week of July 9:
- Pyon PD, EEAgent, and HAAgent added to full system launch
- Launch plan and operator tools fixes delivered throughout iteration; both as stated in tasks and in response to issues found by integration team.
- Launch plan integration work is time consuming and difficult to estimate
- We've been seeing strange load behaviors on OOI hardware. If not solved this could slow down integration process.
- More broadly for R2: we are leaving unfortunately too much functionality for C3. Ideally we should be largely feature complete after C2 but that will not be the case.
Overall this iteration went well. Most of our development tasks were completed well within the time estimates. We added many tests and fixed many bugs. As the launches were used more and more, we added several small features and fixes requested by Jamie. Highlights of the iteration:
- Nimbus 2.10 is currently in release candidate phase and no major bugs have been found. It should be ready for deployment to OOI hardware early in C3. This should substantially increase launch speed and performance due to the new copy-on-write image propagation.
- The Process Dispatcher now has an improved scheduling API that should be largely enough to support R2 needs. It is well tested and integrated into the Pyon container.
- The EPU Management Service now supports EPU definitions
- System launch speed has been dramatically improved through several optimizations. Pyon services are now launched in parallel where appropriate
- HA Agent has been ported to Pyon
- Added many tests and improved logging in EPU layer
- Prototyped process code fetch and start in HA Agent
- Experimented with host sFlow for HA Agent sensor-based scaling
A few things are still lacking for the iteration:
- The system launch with Pyon PD, EE Agent, and HA Agent is still not quite working. Seems very close though.
- The ops team is providing a Dynamic DNS solution for the web service gateway processes. Once this is available CEI will hook it up with recipes.
- The integration team is evaluating logging platforms. Once this is decided, CEI will configure it in the launches.
- We enabled security credentials in the launch but it seems pretty unstable, and it was added to the launch in a very hacky way. We need to consider this more before integrating it into master.
- Mirroring CEI objects to the Resource Registry is not yet done but this will likely happen in the next week or two.