On Mon, Nov 19, 2018 at 7:52 AM Erwan Velu <evelu@xxxxxxxxxx> wrote: > > > Le 16/11/2018 à 21:31, Sage Weil a écrit : > > I can't really comment on golang here, but in python this is trivial: > > either you can tolerate a missing key (e.g., by substituting a default > > value), and use foo.get('field'[, 'default'[), or you can't tolerate it, > > and use foo['field'] so that an exception is thrown if it's missing. > > I perfectly understand what you mean here which imply that every single > external tool, as it will not be in a position to validate the json > scheme will have to implement accessors with catch exception for every > single part of the structure. > > That's also a lot of useless work. > > > [...] > > I liked Noah's initial proposal because it was (1) not much work and (2) > > focused on flagging changes before they are introduced, during make check > > time, so that the developer can either fix their mistake or make a > > conscious decision to change the output schema. > > This is a place where I'm not really comfortable. The scheme is made > manually and from my experience, until you put the cluster in every > possible state, you could miss some structures. > > Noah, how do you deal with this in your validation scheme ? Circling back to this after vacation... There are two validations that I see as relevant: 1) structural 2) relational Validating schema to handle all the varieties of relational differences would be a lot of work. As you point out, it's dependent on system state--traditional live-system testing would not be tractable, so we'd _have_ to factor out all internal representations and then wrap them in APIs that enforce valid relations. My testing doesn't attempt to do this, except for anything that I notice as low-hanging fruit. I do think that structural validation is tractable, and that's what I'd like to complete if we end up agreeing on the approach. I've found that there are two scenarios by which json output is produced: 1) strict: output <-- to_json(my_data_structure) 2) ad-hoc: free form calls to the JSON builder API For case (1) life is like pretty easy. Make sure the schema covers everything in the structure, and that the "generate_test_cases" doesn't generate partially filled instanced (e.g. "list: []") since we want to actually have test cases to validate. For case (2) there really isn't any short cut other than to read the code at this point. What I've found is the following to be helpful (a) generate a schema for instance output of some cluster state, then (b) read the code to fill in gaps in the schema that correspond to cluster states not covered. The goal I believe we should work towards is to migrate existing instances of case (2) into instances of case (1). This would entail (a) picking some target output (b) building an intermediate C++ data structure that is true source of the JSON serialization and (c) writing test cases that fully enumerate possible instances of the schema. - Noah