Skip to end of metadata
Go to start of metadata

Dataset Parser Development

Development of a dataset parser involves development of one or more Python modules placed within the mi/dataset/parser package directory.

In the past, we developed dataset parser extending existing classes like the BufferLoadingParser.  These days, it is encouraged to develop a dataset parser class that extends the SimpleParser class defined in the mi.dataset.dataset_parser module.

In most cases, it is fine to create a single Python file to include your:

  • parser class that extends the SimpleParser class
  • particle classes that extend DataParticle class
  • regular expressions for pattern matching

In some cases, it is okay to create multiple Python files and place data particle class definitions in one file and your parser class in another file.  This may be appropriate in the case of multiple parsers sharing the same particle class definitions.

Some useful modules that new parsers have started using are the mi.dataset.parser.common_regexes and the mi.dataset.parser.utilities modules.  It is recommended to define common regular expressions in the mi.dataset.parser.common_regexes module and use them from there.  Also, common time conversion related utilities and other common utility functions should be defined in mi.dataset.parser.utilities and used from there.

Code Complexity Analysis

At a point where your code is either ready for unit test, or even after unit test, it is important to ensure a low level complexity for ease in maintenance.

If you have not done so, install the flake8 pip module (i.e. pip install flake8==2.2.5)

Once flake8 is installed, you can run the following command to assess complexity of your Python module and other Python modules.

You will see output like the following:

At this point forward we are recommending to keep code complexity levels below 10.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.