Skip to end of metadata
Go to start of metadata
The following are target tasks for the CEI team during the R1 Transition phase.

Specific CEI Transition Tasks

  • David L.: Provisioner Tear-Down timing issue (CEI-153)
  • troubleshooting the issues in R1C3 experiments
  • more failure situation testing
  • help David Foster with ANF
  • help Jamie with newest launch plans
  • Investigate contextualization errors – is EPU controller handling these correctly? (it's not normal to have contextualization errors)
  • epumgmt tests
  • Bump libcloud dep to 0.4.2, needs ssl package
  • Packaging epumgmt into something easy_install-able
  • Creating scripts per epumgmt action
    • mostly doc and launch-plan changes
  • decision engine persistence for reconfigured policies
  • DE cannot get old sensor values in a recovery scenario
    • (more of a general issue going forward than a R1 issue)
  • roll to new libcloud with bundled nimbus driver when it comes out
  • passwords in logs/buildbot
  • multiple provisioner processess work but complicate the "kill all" teardown process, needs addressing
  • infinite loop of failure with ready program and epu controller failures when responding to older queries
    • messaging time out should not cause epu controller to fail there
  • help operations set up a context broker
  • help operations set up --repair mode and constant status reporting
  • Patch for workspace-control to make propagation very fast on UCSD Nimbus
    • this will benefit the launch as well as response times
  • epumgmt should be able to call epu controller to retrieve errors nicely
    • the service operation is in place but not the link to the epumgmt operator (although it can be gotten via logfetch)
  • architecture updates, diagrams
  • Requirements Verification Tests
  • put EPU List service somewhere in the main launch plan, probably colocated with provisioner
  • test artifacts are "clogging up /tmp" on buildbot
    • epu-util-test
    • proc1-stderr---supervisor
  • when epumgmt queries multiple epu controllers for status, it should return information even if one of them times out (this is especially useful during initial launch)
  • cassandra timeouts, telephus should error out
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.