These are just initial thoughts/braindroppings that can be filed elsewhere as needed. The intent is to get some ideas down for a 0MQ-based capability container. The intent is for something that runs on a TS-7370 board and is thus a stripped down version of the full capability container, geared for low power boards.
The low powered boards have limited resources and are unable to run full AMQP brokers and capability containers. The memory footprints of these processes are too large, and the board itself really only needs a way to collect data, store that data, communicate with the other processes on the board, communicate with resources on shore (via a broker), and respond to issues on the board (ie keep themselves alive in error situations).
0MQ looks to be a good lightweight technology that supports the flexibility and transient nature of these sorts of processes. It might be used on the board to handle IPC, as well as flow data over the satellite link as needed. The trouble comes in talking to the rest of the AMQP based ION. 0MQ has support for Python interactions by way of pyzmq.
On the TS-7370, the acquisition processes need to:
- stay alive, restarting automatically upon failure or config change
- respond to commands from shore or other parts of the board
The most efficient and redundant way for acquisition processes to run, survive, crash, and restart is probably to have them be individual UNIX processes. This allows them to be a bit independent in the event of failures, consume few resources, but also be controllable and monitored using some of the OS tools. The intent would be to keep driver processes separate from agent processes to allow for drivers to crash as needed, but still have the agent operating while it restarts.
Keeping acquisition as individual processes then requires some sort of entity to spawn, monitor, and manipulate those processes. Something like a 0MQ version of Antelope's rtexec program is probably a reasonable design. The process monitor would:
- Have some sort of configuration (file?) that lists the desired acquisition processes that should be enabled and how to go about running them. Keeping in line with the existing ION configuration files, this could be a set of Python dicts and/or tuples that could be exec'ed.
- Have a way of notifying the system should a problem arise regarding its operation
- Spawn new acquisition processes
- Keep a heartbeat to know if the child acquisition processes are still alive (via 0MQ)
- Respond to controls to kill/restart a child process
- Keep an eye on data file times to make sure they are updating?? (maybe this goes to a separate data monitoring service?)
- Keep a log of events that happen to it? Stats about processing time for events, uptime, etc.??
Something like the following in YAML may be used to configure the process manager process:
Runtime stats, state, etc. can be kept in memory, reported as needed through an interface to whatever software/user/log that cares in whatever format makes sense (CSV, XML, syslog, etc.)
The other key part of this design is the ability to reuse driver and agent code that is in operation elsewhere in the (more connected) OOICI infrastructure. To do this, the agents and drivers need to have some different process management systems in place, along with communications adjustments. The drivers and agents themselves have perfectly functional op_*() methods, but execution of them is handled in the Process module code. It seems reasonable (at least initially) to develop a 0mqProcess module that implements similar functionality to the existing Process module, but swaps out some of the AMQP based operations with 0MQ calls. Most of this is probably state handling. A 0mqReceiver module will also probably need to be developed to handle the messaging itself with 0MQ calls.
The trick then comes in the services that are usually running in the normal capability container. This includes services like registration, authentication, process trees, messaging via AMQP, etc. The 0mqProcess module will not have those services/interfaces to rely on, so they will need to be stubbed out or externalized into their own processes. This could be spread out into individual processes (managed by the above process management), grouped into a single process that handles many of these core services, or even worked into the above process management logic to have one process be the organizer of all necessary infrastructure, even if it is more of a hub/spoke logic than containment.
At least initially, the container should probably consist of two distinct parts:
- The process management logic that keeps and maintains the unix data acquisition processes running
- The ION scaffolding that keeps the agents and the services they depend on, running.
These two parts may very well be one actual unit. It may be necessary to have the ION scaffolding exist in a unix process that runs an agent and/or its driver. It may be that the process management logic sits on top of a 0MQ message bus of sorts. More details are definitely needed here.