David and I had a productive conversation yesterday about validating the schema of data that is exported through various cli tools and apis. Here is a summary / proposal based on that conversation. One goal of the insights manager module is to export a consistent set of data to the insights engine (think data like `ceph osd dump`). Currently the insights module exports several json representations of data structure (e.g. MonMap), but this means the schema of the exported data is generally codified in methods like MonMap::dump, making it difficult for consumers of this data to monitor for schema changes (and likewise for developers to notice unintended schema changes). This isn't just an insights issue, there is a _lot_ of Python written that assumes a particular schema. One way to handle this is to create a centralized location containing schema specifications of data exported by ceph, and running tests to validate data sources against these expected specifications as code changes are made. I posted a PR (https://github.com/ceph/ceph/pull/23716) that does this for the generated test instances embedded in `ceph-dencoder` (e.g. MonMap), so that `make check` fails if a change to MonMap::dump is not synchronized with changes to the schema (https://github.com/ceph/ceph/pull/23716/commits/5fa58d30e03ad67a64feb2046dee32d753db6f20). Building schemas first for the lowest level structures is convenient because... the schemas for CLI commands or other APIs that embed nested structures may take advantage of a feature in the JSON schema spec that allows schema references so we can compose schemas. This works well to DRY on things like "utime_t". Expanding the approach taken in this PR a bit will cover most of what is needed for the insights manager module. Does this seem like a decent approach? Other data sources that are fed by running daemons or require more complex scenarios to get output coverage (e.g. David's tests with inconsistent PGs and objects) can be handled in run-standalone.sh or qa tests. - Noah