Skip to end of metadata
Go to start of metadata

Overview

For background, see R1C1 Wrapup

Topics

  • Continue making the EPU robust (stress testing and fixes)
  • Refine the policy engine: develop basic strategies for policy manipulation and management
  • Design and implement administrator's interface and process
  • Fully implement and polish the bootstrap system
  • Engage with alpha users mid-iteration if possible
  • Continue SQLStream application support

Topic: Continue making the EPU robust (stress testing and fixes)

Task Lead Description Links
Revive experiment code JB Get LCA experiment and stressing code working with newest launch system CEI R1C2 - Subtopic - EPU stressing
Scenario #1 TBD Stand up 1k VMs CEI R1C2 - Subtopic - EPU stressing
Scenario #2 TBD Reaction time CEI R1C2 - Subtopic - EPU stressing
Scenario #3 TBD Failure compensation CEI R1C2 - Subtopic - EPU stressing
EPU Controller Persistence TF Ensure EPU controllers can fail and be restarted (including on another machine) CEI R1C2 - Subtopic - Persistence
Provisioner Persistence DL Ensure provisioner can fail and be restarted (including on another machine) CEI R1C2 - Subtopic - Persistence
IaaS Idempotency DL Add support for the EC2 idempotency tag to Nimbus IaaS and enhance the provisioner to use it CEI R1C2 - Support IaaS Idempotency

Topic: Refine the policy engine: develop basic strategies for policy manipulation and management

Task Lead Description Links
Create better configuration files TF The current policies are defined in JSON and "embedded" with the EPU
configurations, the task is to generalize this and make the policy descriptions
easier to deal with for the configurer (includes both syntax/format and doc work)
CEI R1C2 - Subtopic - Policy Engine
Productize policy reconfiguration prototype TF In the previous iteration we created a policy reconfiguration prototype in
support of SQLStream activities, the task is to round out corners and make
better clients and documentation
CEI R1C2 - Subtopic - Policy Engine
Update architecture documents to reflect dynamic policy changes TF   [CIADS 02 Common Execution Infrastructure]

Topic: Fully implement and polish the bootstrap system

Task Lead Description Links
API design
JB python API to control boot levels of images and services in many
different clouds
CEI R1C2 - Subtopic - Bootstrap
config file design and documentation
JB explanation of how to create config files for the bootstrap system
CEI R1C2 - Subtopic - Bootstrap
Create pollable objects
JB These objects will be the leaf nodes of the asynch boot system. 
In general they are async objects that are started and polled until they
complete.  There are 4 known specific ones:
         1) start an executable with popen and monitor it for completion
        2) start an IaaS job and wait for its hostname to appear
        3) Container for multi-level pollables.  This object will hold
            lists of pollable objects (made from above).  When all items
            from one list complete, the next is polled.  When all complete
            the object is complete
        4) svc pollable.  a service pollable that contains a IaaS pollable,
            a popen pollable for the contextualize program, a
            popen pollable for the ready program, and a multi-level
            pollable to manage each of them.  it will similarly have
            a popen pollable for the shutdown program.
CEI R1C2 - Subtopic - Bootstrap
Launch plan loader JB Code to read the launch plan and load into the database and unit tests

Create database definition and objects JB The entire launch plan will need to be backed by a database.  Some of
the definition will be fluid in impl, but some must be thought of and 
arranged in advance.
CEI R1C2 - Subtopic - Bootstrap
Create and define needed exceptions JB   CEI R1C2 - Subtopic - Bootstrap
Create service object and unit tests.
JB This is what does the IaaS boot, contextualize, and ready tests.  The leaf
of the boot process.
CEI R1C2 - Subtopic - Bootstrap
Create boot config fab scripts JB Script that loads a deployable type before DTRS exists. 
Currently this is all done in fab, but needs to be converted to a script.
CEI R1C2 - Subtopic - Bootstrap
Comprehensive testing JB Tests for simple plan.  Restarts on the CloudService objects.  Failure cases, etc...
CEI R1C2 - Subtopic - Bootstrap
Context Broker deployable type TBD Add DT for Nimbus context broker including recipes for security coordination CEI R1C2 - Context Broker DT

Topic: Design and implement administrator's interface and process

Task Lead Description Links
Implement cloud boot program using the API.
JB Operator level CLI
CEI R1C2 - Subtopic - Bootstrap
Migrate fetch/status/killall into epumgmt layer
TF   CEI R1C2 - Subtopic - epumgmt
Refine DT registry and create DT loader DL The launch plan will include references to DTs that will be fired up, create a loader
to make this happen dynamically and make the DT registry better handle new DTs,
sites, and template variables that are defined by end-user
CEI R1C2 - DTRS Organization

Topic: Engage with others mid-iteration (~January)

Task Lead Description Links
Work with ITV to evaluate bootstrap and testing system JB    
Work with ITV on base images TF Organize a way to make bundling less ad hoc, refine image
documentation for others
 
Work with DM on base images TF iRods, Cassandra, and ProtoBuf compiler image needs. And we
can safely delete numpy and pydap dependencies?
 
Work with Operations on Nimbus IaaS installation TF   http://www.nimbusproject.org - Especially admin guide
Package separation TF Work with COI to be first set of services to "detach" from
lcaarch into a separate package
Code Structure, Packages, Assemblies, and Repositories

Topic: Continue SQLStream application support

Task Lead Description Links
Work with chef on 64 bit AMI EPU worker TF    
Add Java and SQLStream package installation (with credentials?) TF
   


Subsystem dependencies

  • CEI expecting to be able to work with ITV in January on a hands-on evaluation/deep-learning of the bootstrap and testing system
  • CEI expecting to be able to work with ITV throughout the iteration on base image building/coordination
    • including talking with DM+ITV about the complicated manual iRODS installation that was brought up, this is something that sounds like it will be better "burned in" to an image rather than automated via Chef (best done by a domain expert)
    • including talking with DM+ITV about slimming the dependencies down, especially numpy + pydap
  • CEI expecting to need to support Operations with their installation of the Nimbus IaaS platform
  • CEI expecting COI to "sign off" in January on "robustness testing" of their messaging and container refactoring work, allowing the CEI stress testing evaluation to continue
  • CEI expecting to be the "guinea pig" for separate package installations on top of a separate ION container package (instead of having all subsystem work in one lcaarch directory/package/repository together)


CEI high level components

JIRA tasks for 1.2.3.13.1.3 Resource Management Services

NOTE: login required to see these, otherwise you will probably see "jiraissues: Error on line 1"

jiraissues: Unable to determine if sort should be enabled.


JIRA tasks for 1.2.3.13.1.1 Elastic Computing Services

NOTE: login required to see these, otherwise you will probably see "jiraissues: Error on line 1"

jiraissues: Unable to determine if sort should be enabled.


JIRA tasks for 1.2.3.13.1.2 Excution Engine Catalog&Repository

NOTE: login required to see these, otherwise you will probably see "jiraissues: Error on line 1"

jiraissues: Unable to determine if sort should be enabled.


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.