Skip to end of metadata
Go to start of metadata
This page provides a high-level "storyboard" describing intent and expected content for an R2E1 prototype. This storyboard is initially provided by the architecture team to the development team to coordinate high-level intent of all prototypes, and is subsequently refined in close collaboration between developers and architects. See Prototyping page.

Prototype Overview

See the Prototyping page for a list of all prototypes.

Subsystem CEI
Type Approach Assessment
Priority High
Iteration R2E1
Dates Sept - Oct 2011
Status not started (realized in product)
Lead Tim Freeman
Description Prototype executing a pre-registered algorithm within an existing execution engine (e.g. Matlab script inside Matlab, or C program controlled by CC), connected to input queue and producing into output queue
Risks 2318 Resource and Execution Management Services
2317 Cloud Computing Strategy
2313 Core Infrastructure Scalability
Results in production R2E3

Placement in Release 2 Architecture

The figure above illustrates the main goals: An Execution engine maintains a number of subscriptions to the Data Distribution Network. Based on the arrival of new messages within the subscription queues (can be scheduler event messages too), a transform process hosted within an execution engine can process the messages and produce output messages back to the Data Distribution Network. Execution engines are the IaaS elements of the system. Processes are the executed algorithms on data messages.

The system is aware of the needs to process (transform) data streams and the resources needed to perform the transformation (execution engines, process definitions, process schedules, DDN subscriptions)

Architecture: Requirements and constraints

Process Scheduling: Requirements and constraints

  • IaaS (cloudinit) schedules execution engines: EPUs, empty for processing user processes (=algorithms)
    • Execution engines have an EE agent that can receive process scheduling commands
    • Execution engines register themselves in the resource registry
  • Process scheduling models
    • Queue/reactive: The process consumers messages from a queue on arrival
      • Messages arriving on queues from a DDN subscription
      • Messages arriving on queue from a scheduler subscription
    • One-off based on schedule: A separate process (e.g. Matlab script) is launched and results are processed
  • Processes may maintain complex data state
    • May retrieve historic information from the resource registry (DM catalogs)
    • May keep a cache (sliding window) of recent messages received
    • May pull messages from other subscription queues passively
  • Process schedule model
    • The process scheduling system needs to be aware of the needs to execute processes
      • Which process definition
      • On which execution engine type (EPU)
      • On which execution engine instances
      • Which scheduling model
      • Which expected resource consumption?
        • CPU seconds expected
        • Return latency
        • Licenses
        • Any local storage?
  • Execution Engines
    • Python
    • Java
    • Python agent controlling an external program (UNIX process, e.g. Matlab, C program, Fortran program)

Architectural goals (prepared but not implemented in this prototype):

  • Execution engine types are registered as resources in resource registry
  • Execution engine instances are registered as resources in resource registry
  • Execution engine OUs (VMs) have an agent controlling the EE/OU
  • Process definitions (source code) are registered as resources in resource registry
  • Processes are scheduled for execution

Interfaces (prepared but not implemented in this prototype)

  • Execution engine - Operational Unit Interface
  • Execution engine registry interface
  • Execution engine - Python CC interface
  • Execution engine - Process management interface
  • Execution engine - Process definition repository interface
  • Execution engine - DM pubsub interface

Prototype Goals

  • Connect R1 data distribution network with a transformation process hosted in an "execution engine" container
    • Keep it  "super simple" initially: Compute a sliding window average. The average over a configured time window (e.g. 10 seconds) for all data points arriving in this time window should be computed and published with the same rate as incoming data messages (expected to arrive continuously)
  • Realize an early execution engine for transform processes
  • Python based (or controlled shell executable) algorithm execution - alternatively a viz package (GraphViz?)
  • Feed back transform results back into DDN
  • Determine CEI and DM interfaces and align expectations and understanding
  • This prototype in this iteration is NOT an integration with DM/SA. Stakeholder only.

Prototype Flow

Create a very simple process then extend to more complex cases

  • Process subscribes to a single input stream, write input (commands and data) in to a file which is read by the engine process (Matlab or python)
  • The engine process (NOT running in a CC, just a standard Unix process for the prototype) runs the calculation then writes an output file that can be read by the transform process.
  • The transform process can output to a queue, in subsequent steps the data can be persisted.
  • Attempt to run ~100 time steps over multiple variables
  • Do not bring the engine up/down between runs
  • Use either a science Python script or a matlab scripts. For Matlab, attempt to use input data with a Matlab-readable (hdf) format to reduce complexity.

Enhancements

  • Create a process resource that is contains script and metadata. Generalize the process execution/process adaptor code
  • Coordinate with CEI to provide a execution engine agent that manages the Matlab (or science python) engine and provides status
  • Persist the output when specified in the process resource

Additional Design Elements

Comments, Ideas

Labels

r2e1prototype r2e1prototype Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.