exported data schema validation

Noah Watkins <noahwatkins@xxxxxxxxx> · Thu, 23 Aug 2018 15:11:05 -0700

David and I had a productive conversation yesterday about validating
the schema of data that is exported through various cli tools and
apis. Here is a summary / proposal based on that conversation.

One goal of the insights manager module is to export a consistent set
of data to the insights engine (think data like `ceph osd dump`).
Currently the insights module exports several json representations of
data structure (e.g. MonMap), but this means the schema of the
exported data is generally codified in methods like MonMap::dump,
making it difficult for consumers of this data to monitor for schema
changes (and likewise for developers to notice unintended schema
changes). This isn't just an insights issue, there is a _lot_ of
Python written that assumes a particular schema.

One way to handle this is to create a centralized location containing
schema specifications of data exported by ceph, and running tests to
validate data sources against these expected specifications as code
changes are made.

I posted a PR (https://github.com/ceph/ceph/pull/23716) that does this
for the generated test instances embedded in `ceph-dencoder` (e.g.
MonMap), so that `make check` fails if a change to MonMap::dump is not
synchronized with changes to the schema
(https://github.com/ceph/ceph/pull/23716/commits/5fa58d30e03ad67a64feb2046dee32d753db6f20).

Building schemas first for the lowest level structures is convenient
because... the schemas for CLI commands or other APIs that embed
nested structures may take advantage of a feature in the JSON schema
spec that allows schema references so we can compose schemas. This
works well to DRY on things like "utime_t".

Expanding the approach taken in this PR a bit will cover most of what
is needed for the insights manager module. Does this seem like a
decent approach?

Other data sources that are fed by running daemons or require more
complex scenarios to get output coverage (e.g. David's tests with
inconsistent PGs and objects) can be handled in run-standalone.sh or
qa tests.

- Noah