Integration in R1 was painful in large part because it was so difficult to test many components in relative isolation. And testing the full system was extremely slow and difficult to debug. One of our goals for R2 was to alleviate this pain with tools to facilitate testing without the full stack of VMs and software. We haven't achieved as much as we would have liked by this point, but there is still time to build something useful during elaboration.
In E1 we developed a version of the provisioner service that uses Vagrant to start up VMs on a development machine, instead of using IaaS. This has proved a useful tool for developing, testing, and debugging Chef recipes and other "last mile" deployment activities such as installing software and dependencies. This is however not a good tool for testing a full launch plan. Booting is faster than IaaS, but still is too slow to provide a responsive development cycle. Plus the resource demands of running so many VMs may be too much for a developer's laptop.
The R2 launch can be broken into three phases:
- Dependency Setup: standing up or configuring overall dependencies (CouchDB, RabbitMQ, etc) -- ignored for the purposes of this document.
- CEI Core Launch: Provisioner, EPU Management Service (EPUM), Process Dispatcher (PD). This services work in concert to boot and manage Execution Engine (EE) resources on VMs as needed.
- Service Process Dispatch: Running "real" processes on EE resources using the PD. These could be services in their own right, or HA Service Agents that themselves manage processes. There will be many levels of these deployments which have a required order and may have configuration dependencies.
For the purposes of most integration testing, the Dispatch phase is really the important one. This phase does not have dependencies on the entire CEI stack, just the Process Dispatcher and the EEAgent. If these components can be isolated and run locally, It should be possible to test the full Dispatch phase of the launch quickly and without any VMs.
The goal is to build a launch plan with (at least) two plan entry points: a primary plan that does the full launch using VMs and a lightweight plan that sets up a minimum CEI stack on your local machine and uses it to dispatch processes. As much as possible we will reuse the same Dispatch bootlevels and configuration so it is a reasonable test of the system.
The lightweight plan will employ localhost SSH (until cloudinitd supports local command execution natively). Instead of starting VMs and standing up all of the CEI services in order, the lightweight plan will start just a Process Dispatcher, an EEAgent, and perform the necessary configuration to make this minimal setup functional. The code to do this will live in a new CEI repository and package: EPUHarness. This package depends on epu, eeagent, cloudinit.d, and possibly epumgmt packages and also provides some glue code and commands for starting and managing the lightweight environment.
In the full launch plan, the complete stack of CEI services is bootstrapped onto one or more base node VMs. These services in turn boot and manage a pool of Execution Engine VMs, on which processes are started.
The lightweight launch doesn't start or require any VMs. The boot levels that start CEI services are replaced with a single level that calls an EPUHarness script on the local machine. This script starts the Process Dispatcher, and one or more EEAgents. It performs any necessary configuration to hook these services together. When the boot level completes, the PD is running and ready to accept process requests.
The Dispatch bootlevels will expect a set of dependency variables from the preceding levels. These will describe how to make the process dispatch. Appropriate values will be provided by the preceding CEI level(s) in both the full and lightweight plans.
To use the Lightweight Launchplan, the developer would:
- Create a new virtualenv, and install EPUHarness into it.
- Install the code and dependencies for any dispatched processes.
- Create a configuration file to connect to local RabbitMQ and Pyon
- Check out the Launch Plan Directory
- Set environment variables to point at the virtualenv
- Then, the code, test, debug loop should be as follows:
- Make changes to code.
- Run "cloudinitd boot" for the lightweight launch plan
- When finished testing, run "cloudinitd terminate".
- If cloudinitd terminate fails, use EPUHarness's kill action to clean up all running processes.
- Goto 6.
By default, this starts the process dispatcher and eeagent, and hooks them together. If desired, a user can feed in a configuration file that describes a deployment scenario, for example a node with a pd and two eeagents, or two nodes, each with a pd and an eeagent.
Stops the pd and eeagent processes. By default, this will return an error if any EEAgents still have any processes running. The force flag allows you to stop as many processes as possible, and then exit.
- EPUHarness configuration file should by default support the guest:guest@localhost rabbitmq configuration, which is available by default in a rabbitmq installation. Otherwise, they can provide a configuration file on the command line. (Or some default location?)
- EPUHarness will need to start the PD and EEAgent as daemon processes. Maybe this is a good job for pidantic? Otherwise, we could use subprocess to start pd and eeagent on the command line, or start them in python and daemonize them.
- The process IDs of the PD and EEAgent started by EPUHarness will need to be persisted somewhere. If pidantic handles this, then that could be used, otherwise, perhaps we could use a location like ~/.epu/pidfile to store these. This could also be configured in the configuration file.
- Each deployment is organized into virtual nodes. These correspond to VMs launched by EPUM in a full launch. The node is simulated by having the EPUHarness send the same messages that EPUM would send to the PD when a node is ready.
Users will need to experiment with different deployment schemes, as referenced in the UI sketch above. Users can provide a config file with a description of the deployment scenario they would like. An example to start:
Each top level item in the description is the name of a new node. Each node is a list of process-dispatcher and eeagent objects. Each of these should have a name specified to be addressable by the other nodes. This name will be the name used when connecting to dashi. Then you connect services together by specifying the corresponding pd or eeagent that it should connect to.
As another example, the default configuration for EPUHarness follows:
It is a single node, with a single process dispatcher and a single eeagent. the eeagent and the process-dispatcher are hooked together by referencing each others names.