Skip to end of metadata
Go to start of metadata
This page describes the plan for a subsystem iteration including the high level goals for the iteration. It should also contain dependencies on other subsystems, plans for integrating functionality and a list of iteration tasks or a link to the subsystem google doc containing the iteration task details. This page should also contain references to the associated architecture pages, construction plan pages, and use cases.

Table of Contents

Iteration Goals and Scope

The primary goal of this iteration is to provide a reliable and performant system launch and repair. We will find and fix bugs and improve launch/repair performance as needed. We will allocate much of our time to supporting the overall integration efforts. We will also continue Pyon integration development, focusing on mirroring resource objects to the Resource Registry where possible. We will integrate the HA Agent into the primary launches and add features to support basic autoscaling.

High-Level Iteration Tasks

  • Provide and support functioning full system launch
  • Ensure support for multiple containers per VM
  • Add support to launch for reliable system restart
  • Integrate HA Agents to launch for services and provide preliminary service autoscale support.
  • Improve robustness and reliability of CEI system overall
  • Mirror EPU-layer objects to resource registry

Subsystem Dependencies

  • As usual, CEI will work closely with the integration team on the launch plans and testing
  • We will also work with Jonathan for scale testing the Process Dispatch layer
  • The operations team will provide Dynamic DNS and continue to maintain the base images
  • COI will provide tighter integration for process state detection in the container

Integration Points

Integration Point One (week 1, 2, & 3 of iteration development period: Sept. 4 - Sept. 21)

  • (CEI) Integration tests working for leader election pattern within EPU Management and provisioner services

Integration Point Two (week 4, 5, and 6 of iteration development period : Sept. 24 - Oct. 12)

  • (Int, CEI) Demonstrate R2 System launch on EC2 with running integration tests
  • (CEI) Demonstrate multiple containers running per execution engine VM
  • (CEI) Demonstrate ability to mirror CEI objects within the Resource Registry to enable viewing of resources within UI
  • (CEI) Demonstrate launch of system utilizing working container to container security
  • (CEI) Demonstrate working communications between Apache/Flask and multiple instances of CEI Launched Service Gateway in end-to-end Alpha test sytsem
  • (CEI) Demonstrate elastic scaling policy applied to select services utilizing host SFlow integration

Integration Point Three (week 7 and 8 of iteration development period : Oct. 15 - Oct. 26 )

  • (CEI, COI, Int) Demonstrate general Nimbus system restart, including Process Dispatcher support of system restart, which restart processes that existed prior to shutdown
  • (CEI, Int) Demonstrate CEI Integration with new centralized graylog2 Logging mechanism
  • (Int, CEI) Demonstrate CEI-Launched and working end-to-end Alpha test system

Risks

  • Historically launch plan and integration work is unpredictable and takes much longer than anticipated
  • We are more closely integrated with Pyon now so are more exposed to bugs and refactors as it continues to evolve

References

Iteration Google Doc Task List

CEI googledoc task list: https://docs.google.com/spreadsheet/ccc?key=0AttCeOvLP6XMdGFEd0hHTUIyUGxQd3dFV1g5N3JrTWc&authkey=CN6Xm74K&authkey=CN6Xm74K#gid=8

Jira Burndown Chart

CEI burndown chart: https://jira.oceanobservatories.org/tasks/secure/ChartBoard.jspa?selectedProjectId=10091

Wrapup

We accomplished a lot this iteration but still ended up dropping several tasks. Due to the ramping up integration, we spent a lot of unplanned time chasing down bugs and implementing small features. We were also distracted by R3 planning, and the departure of John Bresnahan, a key developer.

The only critical missing piece is restart support in the launch plan. This work will continue over the next couple of weeks. We also have a small list of improvements to make during IOC prep and transition (in addition to any bugs that come up):

  • More logging to greylog – the capability container instances log to greylog, but there are still other services that should be added (CEI internals, command line tools, RabbitMQ, Couch, VMs, etc)
  • Better sysname support in CEI
  • Support feeding in Pyon configs to cloudinit.d (currently baked into plan)
  • Better command line tool UX
  • CEI launch files (in progress)
  • Improved PD matchmaker algorithm – behavior doesn't account for multiple CCs/VM
  • Better error handling in PD matchmaker dispatch
  • EPUM launch request retries

Labels

r2c3-iteration r2c3-iteration Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Aug 23, 2012

    Michael Meisinger says:

    Meeting notes http://etherpad.oceanobservatories.org/ceir2c3