On Thu, 6 Sep 2018, Gregory Farnum wrote: > On Thu, Sep 6, 2018 at 3:34 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > > On Thu, 6 Sep 2018, Gregory Farnum wrote: > >> >> It may just be bias in my recent thoughts, but it seems like the most > >> >> valuable thing to do here is to generate the encoded structures from > >> >> each rc and make sure the previous version of the code can read them. > >> >> In other words, generate a ceph-object-corpus for each existing > >> >> release and proposed rc (which I've been planning to start pushing on, > >> >> but haven't yet, so I don't even remember how we generate them!), then > >> >> run the encode and decode with the *old* software. > >> > > >> > Yes!! I like this because it also kills a couple birds with one stone: if > >> > we automate the process of generating corpus objects then we can also make > >> > sure we're covering all the usual upgrade cases too. > >> > > >> > The challenge is that generating corpus objects means a custom > >> > build and then running a sufficiently broad set of workloads to > >> > instantiate as many different and interesting object instances as > >> > possible. The > >> > > >> > //#define ENCODE_DUMP_PATH /tmp/something > >> > > >> > needs to be defined, everything built, and then the system run with a > >> > bunch of workloads. This fills /tmp/something will a bazillion object > >> > instances. There are then some scripts that dedup and try to pick out > >> > a sample with varying sizes etc. > >> > > >> > I don't have any bright ideas on how to do that easily and in an automated > >> > way, though.. we presumably want to do it right before release to make > >> > sure everythign is kosher (to ensure compat with everything in teh > >> > corpus), and also generate objects on actual releases (or builds of the > >> > same sha1 + the above #define) to populate the corpus with that release. > >> > >> So...why do we actually need to run a cluster to generate these? Is it > >> in fact infeasible to automatically generate the data? Was it just too > >> annoying at the time ceph-object-corpus was set up? I haven't examined > >> it in detail but we already include simple ones in the > >> generate_test_instances stuff for everything in > >> src/tools/ceph-dencoder/types.h, though I can certainly imagine that > >> these might not be good enough since a lot of them are just > >> default-initialized. (Do we catch examples from anything that *isn't* > >> specified in that file?) > >> > >> Something like running a cluster through the final set of teuthology > >> suites set up this way might be the best solution, but I wonder if > >> this was investigated and decided on or just that it worked once upon > >> a time. ;) > > > > The problem is that those generate_test_instances are (1) sparse and > > minimal, and (2) would require huge developer investment to fill in with > > "realistic" instances, and (3) even though wouldn't necessarily be > > representative of what happens in real life. The ENCODE_DUMP_PATH thing > > collects actual objects from a real cluster with a real workload so that > > you can get a "real" sampling. > > > > The hard part is mostly generating a worklaod with good coverage (rados > > API tests, cls tests, etc are a good start for RADOS; for cephfs we need > > to multi-mds to cover all of the subtree migration related types; for rgw > > we'll want to do get coverage for the multisite stuff, etc etc.). > > > > That's a bit of work, but it's still much less work than hand-crafting > > object instances that may or may not be "real". > > Yeah, that makes sense. But turning that around, why not just grab and > sample from the OSD and monitor disk stores after our existing > teuthology runs happen? Does the ENCODE_DUMP_PATH stuff also include > wire protocol messages that don't get put on disk? It includes everything that passes through the ENCODE_{START,FINISH} macros, which is pretty much every object with an encode/decode method defined. One could write a tool to pull data structures out of, say, bluestore, but that would only cover the dozen or so bluestore-related types. ceph-dencoder currently recognizes ~425. About 220 are captures by the most recent version in ceph-object-corpus (kraken :( ). Unfortunately the encode dump stuff is super inefficient.. I don't think it's something we can easily build in to our real builds. Well... maybe we could build it into the notcmalloc (debug) builds, actually... sage