On Thu, Aug 23, 2018 at 3:11 PM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote: > David and I had a productive conversation yesterday about validating > the schema of data that is exported through various cli tools and > apis. Here is a summary / proposal based on that conversation. > > One goal of the insights manager module is to export a consistent set > of data to the insights engine (think data like `ceph osd dump`). > Currently the insights module exports several json representations of > data structure (e.g. MonMap), but this means the schema of the > exported data is generally codified in methods like MonMap::dump, > making it difficult for consumers of this data to monitor for schema > changes (and likewise for developers to notice unintended schema > changes). This isn't just an insights issue, there is a _lot_ of > Python written that assumes a particular schema. > > One way to handle this is to create a centralized location containing > schema specifications of data exported by ceph, and running tests to > validate data sources against these expected specifications as code > changes are made. > > I posted a PR (https://github.com/ceph/ceph/pull/23716) that does this > for the generated test instances embedded in `ceph-dencoder` (e.g. > MonMap), so that `make check` fails if a change to MonMap::dump is not > synchronized with changes to the schema > (https://github.com/ceph/ceph/pull/23716/commits/5fa58d30e03ad67a64feb2046dee32d753db6f20). > > Building schemas first for the lowest level structures is convenient > because... the schemas for CLI commands or other APIs that embed > nested structures may take advantage of a feature in the JSON schema > spec that allows schema references so we can compose schemas. This > works well to DRY on things like "utime_t". > > Expanding the approach taken in this PR a bit will cover most of what > is needed for the insights manager module. Does this seem like a > decent approach? Unless I'm misunderstanding, we already do exactly this for the human-readable CLI interfaces with the cram tool. (And it can be quite annoying if you don't update the test standard on removing a CLI option!) See the contents of ceph.git:src/test/cli and the ceph.git:src/test/run-cli-tests script Generally, it seems like we ought to be able to extend those to use the json formatting rather than just human-readable output. Or are you trying to do something more powerful? -Greg