compared with
Version 21 by Bill French
on May 12, 2014 08:41.

This line was removed.
This word was removed. This word was added.
This line was added.

Changes (1)

View Page History

rte_o_dcl parser: 'rte_o_dcl_telemetered', 'rte_o_dcl_recovered'

h3. State

We need to maintain state about all the files detected in the directory and parser state for each of those files.  The intent of the parser state is to ensure we can recover from a parser failure and partial ingestion. While the harvester state will likely be wrapped by class objects in the code, ultimately it will be stored as a serialized dictionary in OOIN via the agent persistence mechanism.  The state will be stored in a layered dictionary, with each file having a dictionary within the top level dictionary.  The key for each file sub-dictionary will be the file name.  There will also be a version field in the top dictionary, which can be used for backwards compatibility if there are changes to the harvester state in the future. 

|| parameter || description ||
| file_name | name of the file (not the full path, just the name), this will be the key for each file sub-dictionary |
| file_size | size in bytes reported by stat |
| file_mod_date | unix time in epoch seconds of file modification time |
| file_checksum | calculated file checksum using md5 in python hashlib |
| ingested | Boolean if ingestion of this file is complete |
| parser_state | object specific to each parser representing parser state |
| modified_state (optional) | if a file is modified after it has been ingested, the modified state will be stored here, and have fields: file_size, file_mod_date, file_checksum (same as described above) |
| version | a top level version for this state, which can be used if the harvester state is modified in the future to handle backwards compatibility |

driver_state = {
version: 0.1,
harvester_id_telemetered: {
file_name(1): {
# Harvester File State
file_size: 10,
file_mod_date: 121323121,
file_checksum: somechecksumvalue,
ingested: True,
parser_state: { # parser specific dict }
harvester_id_recovered: {
file_name(2): {
# Harvester File State that has been modified after ingestion
file_size: 112,
file_mod_date: 121323500,
file_checksum: checksum_A,
ingested: True,
parser_state: { # parser specific dict },
modified_state: { file_size: 112, file_mod_date: 121323750, file_checksum: checksum_B }