Re: Getting the JSON schema of commands

Noah Watkins <nwatkins@xxxxxxxxxx> · Mon, 3 Dec 2018 15:49:13 -0800

On Mon, Nov 19, 2018 at 7:52 AM Erwan Velu <evelu@xxxxxxxxxx> wrote:
>
>
> Le 16/11/2018 à 21:31, Sage Weil a écrit :
> > I can't really comment on golang here, but in python this is trivial:
> > either you can tolerate a missing key (e.g., by substituting a default
> > value), and use foo.get('field'[, 'default'[), or you can't tolerate it,
> > and use foo['field'] so that an exception is thrown if it's missing.
>
> I perfectly understand what you mean here which imply that every single
> external tool, as it will not be in a position to validate the json
> scheme will have to implement accessors with catch exception for every
> single part of the structure.
>
> That's also a lot of useless work.
>
> > [...]
> > I liked Noah's initial proposal because it was (1) not much work and (2)
> > focused on flagging changes before they are introduced, during make check
> > time, so that the developer can either fix their mistake or make a
> > conscious decision to change the output schema.
>
> This is a place where I'm not really comfortable. The scheme is made
> manually and from my experience, until you put the cluster in every
> possible state, you could miss some structures.
>
> Noah, how do you deal with this in your validation scheme ?

Circling back to this after vacation...

There are two validations that I see as relevant:

  1) structural
  2) relational

Validating schema to handle all the varieties of relational
differences would be a lot of work. As you point out, it's dependent
on system state--traditional live-system testing would not be
tractable, so we'd _have_ to factor out all internal representations
and then wrap them in APIs that enforce valid relations. My testing
doesn't attempt to do this, except for anything that I notice as
low-hanging fruit.

I do think that structural validation is tractable, and that's what
I'd like to complete if we end up agreeing on the approach.

I've found that there are two scenarios by which json output is produced:

  1) strict: output <-- to_json(my_data_structure)
  2) ad-hoc: free form calls to the JSON builder API

For case (1) life is like pretty easy. Make sure the schema covers
everything in the structure, and that the "generate_test_cases"
doesn't generate partially filled instanced (e.g. "list: []") since we
want to actually have test cases to validate.

For case (2) there really isn't any short cut other than to read the
code at this point. What I've found is the following to be helpful (a)
generate a schema for instance output of some cluster state, then (b)
read the code to fill in gaps in the schema that correspond to cluster
states not covered.

The goal I believe we should work towards is to migrate existing
instances of case (2) into instances of case (1). This would entail
(a) picking some target output (b) building an intermediate C++ data
structure that is true source of the JSON serialization and (c)
writing test cases that fully enumerate possible instances of the
schema.

- Noah