On Tue, 17 Jan 2012, Gregory Farnum wrote: > On Mon, Jan 16, 2012 at 3:46 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > When we can't handle a change with a compatible encoding change, we can > > introduce a feature bits and conditionally encode old formats for old > > peers. This is just more work and eats into a more limited feature bit > > space. > > I'd like to hear a lot more about this ? it's the real problem. I > think we've already got one example in the codebase of encoding for > backwards-compatibility, and it's a huge mess. > Could we construct reasonable macros/functions that take in compat > bits from the daemon we're targeting and then select the appropriate > encoder based on those, for instance? It certainly shouldn't be hacked > together on a case-by-case basis because we're going to start running > into a lot more of these things as we start testing for > forwards-and-backwards compatibility. The current pattern is void Foo::encode(bufferlist& bl, unsigned features) const { if (features & NEW_THING) { // encode normally } else { // encode old way (wahtever was in this func) } } I'm not sure what to add here to make this less painful or error-prone. ceph-dencoder does have options to print the feature set for the version it was compiled for, and to set the bits passed to encoder functions, so that we can test these paths. > > void pool_snap_info_t::decode(bufferlist::iterator& bl) > > { > > DECODE_START_LEGACY(2, bl, 2); > > > > We can still decode v1 code, but it doesn't have the (new) length and > > incompat version fields, so use the _LEGACY macro. The second 2 means we > > started using the new approach with v2. > > Is there a DECODE_START (non-legacy) for new structs? yeah > > This requires and initial incompat change to add the length and incompat > > fields, but then we can generally add things without breakage. > Hmm...if we come up with something that talks feature bits we could do > this nicely with a compat bit. :) Hmm, yeah. If we redo a bunch of encoders at once we can use a single feature bit for all of them. That's a good idea! > > How to build the per-version corpus? We can write unit tests that > > explicitly generate interesting object instances. That's tedious and time > > consuming, but probably best, since the developer knows what corner cases > > are interesting. > > If we've got a single tool that can handle all the structs we can > presumably encode each struct from a given version? > Obviously constructing actual state would require more work but just > checking the default constructor version of each struct would go a > long, loooong way toward automating encode/decode checks. Yeah. So the new idea here is to create a static class methods like static void generate_test_instances(list<T>& o); and have the implementation fill a list with some interesting object instances. That normally looks something like void foo_t::generate_test_instances(list<foo_t>& o) { foo_t a; o.push_back(a); a.x = 1; a.y = 2; a.z.push_back(3); o.push_back(a); } and the ceph-dencoder lets you select these generated objects to encode/decode/dump/whatever. I cleaned up the tool usage a bit and it currently looks like $ ./ceph-dencoder -h usage: ceph-dencoder [commands ...] version print version string (to stdout) import <encfile> read encoded data from encfile export <outfile> write encoded data to outfile features <num> set feature bits used for encoding get_features print feature bits (int) to stdout list_types list supported types type <classname> select type decode decode into in-core object encode encode in-core object dump_json dump in-core object as json (to stdout) count print number of generated test objects (to stdout) select <n> select generated test object as in-core object There are also some preliminary scripts to populate an archive that looks something like ar ar/0.40 ar/0.40/osd_stat_t ar/0.40/osd_stat_t/25afb7d451b03137f945307a1ed82a33 ar/0.40/osd_stat_t/800a085cc38e531cd41af1905a0f37e8 ... ar/0.40/utime_t ar/0.40/utime_t/8bdbd752410ebe5fb417a23173910b0f ar/0.40/utime_t/e8884a8076dd8e639f5bfa3b896427a8 ar/0.40/utime_t/6bd3e253cb30ad41417f055be3f8a485 ... I'm thinking we also include a built ceph-dencoder in each version dir, so that some scripts can do the cross-version validation. That just requires standarizing the usage for ceph-dencoder, so please poke some holes in the above! thanks sage