What I'm shooting for is a way to cover more scenarios than CLI invocations, including scenarios that are cumbersome to handle with CLI tools. And I think the best way to do that is to validate at more than one level. Here's an example: Because the CLI interfaces only export data about the current state of the system, producing output that forms a covering set used for validating schemas requires forcing the cluster into a sufficient number of states / configurations. A simple example of this would be any instance of dumping a json representation of entity_addr_t which can take on many forms. Arranging for a cluster configuration that produces all these forms so they may be validated through an interface like `ceph mon dump -f json-pretty` would be a pretty big headache. In this case it's convenient to perform validation at the MonMap::dump interface directly (what my PR is doing) because test instances can be generated much more easily. This lower level validation covers the cases for manager modules, which don't invoke CLIs but rather dump json representations directly. Of course we still want to validate the schema of CLI output as well. This task is made easier by having schemas for the building blocks of CLI outputs (e.g. MonMap). I'm sure there are also CLI outputs that don't correspond nicely to data structures with dump methods. Using the existing cram tool and cli tests makes sense to the extent to which that tool can generate sufficient coverage. For example, the setup for testing `rados list-inconsistent-pgs` output is non-trivial, but validating the schema could largely be done without any setup since it corresponds to a list of pg_t. - Noah On Thu, Aug 23, 2018 at 3:31 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Thu, Aug 23, 2018 at 3:11 PM, Noah Watkins <noahwatkins@xxxxxxxxx> wrote: > > David and I had a productive conversation yesterday about validating > > the schema of data that is exported through various cli tools and > > apis. Here is a summary / proposal based on that conversation. > > > > One goal of the insights manager module is to export a consistent set > > of data to the insights engine (think data like `ceph osd dump`). > > Currently the insights module exports several json representations of > > data structure (e.g. MonMap), but this means the schema of the > > exported data is generally codified in methods like MonMap::dump, > > making it difficult for consumers of this data to monitor for schema > > changes (and likewise for developers to notice unintended schema > > changes). This isn't just an insights issue, there is a _lot_ of > > Python written that assumes a particular schema. > > > > One way to handle this is to create a centralized location containing > > schema specifications of data exported by ceph, and running tests to > > validate data sources against these expected specifications as code > > changes are made. > > > > I posted a PR (https://github.com/ceph/ceph/pull/23716) that does this > > for the generated test instances embedded in `ceph-dencoder` (e.g. > > MonMap), so that `make check` fails if a change to MonMap::dump is not > > synchronized with changes to the schema > > (https://github.com/ceph/ceph/pull/23716/commits/5fa58d30e03ad67a64feb2046dee32d753db6f20). > > > > Building schemas first for the lowest level structures is convenient > > because... the schemas for CLI commands or other APIs that embed > > nested structures may take advantage of a feature in the JSON schema > > spec that allows schema references so we can compose schemas. This > > works well to DRY on things like "utime_t". > > > > Expanding the approach taken in this PR a bit will cover most of what > > is needed for the insights manager module. Does this seem like a > > decent approach? > > Unless I'm misunderstanding, we already do exactly this for the > human-readable CLI interfaces with the cram tool. (And it can be quite > annoying if you don't update the test standard on removing a CLI > option!) See the contents of ceph.git:src/test/cli and the > ceph.git:src/test/run-cli-tests script > > Generally, it seems like we ought to be able to extend those to use > the json formatting rather than just human-readable output. Or are you > trying to do something more powerful? > -Greg