The objective of this work is to design and prototype an on-demand scalable cloud execution environment based on a reliable asynchronous messaging infrastructure and standardized deployable units that can operate in multiples to realize the scalability and self-healing requirements of the EPU (Elastic Processing Unit). The emphasis is on building the infrastructure rather than on implementing specific functions; part of the deliverables is a documentation of the lessons learned and specific recommendations for future work. The time period is March-October 2009.
The figure below shows the basic architecture of the CPE prototype (see also the comprehensive Kick-Off poster). It comprises distributed processes that communicate based on a publish/subscribe messaging system. Part of the architecture design is to specific the communication styles of message interchange and the message protocols and formats. The prototype CPE system targets a Nimbus cloud on a U.Chicago cluster and the Amazon EC2 cloud.
The figure below shows the observe-decide-act pattern of autonomous systems applied to the Scheduler component.
The figure below shows details of the launching and contextualization process for both the Nimbus and Amazon EC2 components. The actual process instances are based on base images specific to Nimbus and EC2 environments. After startup and during the contextualization phase, the actual process code is loaded from a repository in the network. The context broker provides instance specific contextualization.
- Matthew Arrott, UCSD
- Alex Clemesha, UCSD
- Kate Keahey, Univ. of Chicago/ANL
- David LaBissoniere, Univ. of Chicago/ANL
- Michael Meisinger, UCSD
- Dorian Raymer, UCSD
- Design and prototype the first iteration of a scalable cloud execution environment that can be interfaced via a reliable asynchronous messaging infrastructure that can deploy deployable units in multiples to realize the scalability and self healing requirements of the EPU (Ever Present Unit).
- Create a deployable type with messaging service. Prepare two instantiations of this deployable type: one that can operate within the EC2 environment, and one that can operate on Nimbus and/or other Science Clouds. Test, and document (or fix) any issues preventing full interoperability and build bridges wherever appropriate.
- Instrument the deployable type with sensors to turn it into "monitorable deployable type". The specific sensors will be: memory consumption (as measured by top inside the VM), CPU consumption (likewise), network traffic incoming/outgoing (external I/O).
- Develop an "idealized" EPU such that for every monitorable deployable unit it (a) provisions it, (b) monitors the sensors associated with the unit, and (c) based on the information returned by the sensors AND policies tied to the sensors and unit it scales the functionality of the unit (what about the app router?)
- Integrate this prototype with the following:
- A global traffic management and load balancing network environment that supports the scalability and localization requirements of EP-DN (Ever Present Domain Names) and EPR-IP (Ever Present Regional IP addresses) which resolve to dynamically allocated cloud resource IP addresses.
- An authoring pipeline for the publication of service components and their composition into deployable units for specific execution environments, i.e., Amazon's EC2 and Nimbus
- Agent services (Facility and Execution Agents) being developed as a part of the COI prototyping effort.
- The Global Messaging Service being developed as part of the COI prototyping effort.
- Documentation, including architecture, a description of conversation patterns and XML schemas for contextualization activities.
- Report documenting lessons learned, technology choices, and experience from the prototype.
Page: CPE Iteration1
Page: CPE Iteration1 deliverables
Page: CPE Iteration2
Page: CPE Iteration2 deliverables
Page: CPE Iteration3
Page: CPE Iteration3 deliverables
Page: CPE Iteration4
Page: Execution Environment Console
Page: Nanite Interactive Provisioning of RabbitMQ Cluster
Page: Nanite Provisioning of RabbitMQ Cluster
Page: RabbitMQ Cluster
Page: Ruby EC2 lib install & usage
Page: Twisted's AMP and AMPoule
Page: Twisted Component Based Architecture
Page: Twisted Logging and Logging Service Design
Page: Using Nimbus Context Broker