Re: wip-encoding

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Tue, 17 Jan 2012 20:56:58 -0800

On Mon, Jan 16, 2012 at 3:46 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> When we can't handle a change with a compatible encoding change, we can
> introduce a feature bits and conditionally encode old formats for old
> peers.  This is just more work and eats into a more limited feature bit
> space.

I'd like to hear a lot more about this — it's the real problem. I
think we've already got one example in the codebase of encoding for
backwards-compatibility, and it's a huge mess.
Could we construct reasonable macros/functions that take in compat
bits from the daemon we're targeting and then select the appropriate
encoder based on those, for instance? It certainly shouldn't be hacked
together on a case-by-case basis because we're going to start running
into a lot more of these things as we start testing for
forwards-and-backwards compatibility.

> There are two things going on in this branch.  The first part is a
> 'standard' way of constructing the encode/decode functions to facilitate
> backward and forward compatibility and incompatibility detection.  The
> basic scheme is:
>
>  1 byte - version of this encoding
>  1 byte - incompat version.. the oldest code version we expect to be able
>           to decode this
>  4 bytes - length of payload
>  ... data ...
>
> In general, when we decode, we verify that the incompat version is <= our
> (code) version.  If not, we throw an exception.  Then we decode the
> payload, using the version for any conditionals we need to (e.g. to skip
> newly added fields, etc.).  We skip any data at the end.
>
> When we revise an encoding, we should add new fields at the end, and in
> general leave old fields in place, ideally with values that won't confuse
> old code.

This sounds good to me!

> To make this painless, there are a few new macros to do the encode/decode
> boilerplate.  If the encode/decode functions we originally
>
>  void pool_snap_info_t::encode(bufferlist& bl) const
>  {
>   __u8 struct_v = 1;
>   ::encode(struct_v, bl);
>   ::encode(snapid, bl);
>   ::encode(stamp, bl);
>   ::encode(name, bl);
>  }
>
>  void pool_snap_info_t::decode(bufferlist::iterator& bl)
>  {
>   __u8 struct_v;
>   ::decode(struct_v, bl);
>   ::decode(snapid, bl);
>   ::decode(stamp, bl);
>   ::decode(name, bl);
>  }
>
> then we would revise them to be
>
>  void pool_snap_info_t::encode(bufferlist& bl) const
>  {
>   ENCODE_START(2, 2, bl);
>
> New version is 2.  v1 code can't decode this, so the second argument
> (incompat) is also 2.
>
>   ::encode(snapid, bl);
>   ::encode(stamp, bl);
>   ::encode(name, bl);
>   ::encode(new_thing, bl);
>   ENCODE_FINISH();
>  }
>
>  void pool_snap_info_t::decode(bufferlist::iterator& bl)
>  {
>   DECODE_START_LEGACY(2, bl, 2);
>
> We can still decode v1 code, but it doesn't have the (new) length and
> incompat version fields, so use the _LEGACY macro.  The second 2 means we
> started using the new approach with v2.

Is there a DECODE_START (non-legacy) for new structs?

> This requires and initial incompat change to add the length and incompat
> fields, but then we can generally add things without breakage.
Hmm...if we come up with something that talks feature bits we could do
this nicely with a compat bit. :)

> The second question is how to test compatibility between different
> versions of code.  There are a few parts to this.
>
> First, a ceph-dencoder tool is compiled for each version of the code that
> is able to encode, decode, and dump (in json) whatever structures we
> support.  It works something like this:
>
>  ceph-dencoder object_info_t -i inputfile decode dump_json
>
> to read in encoded data, decode it, and dump it into json.  We can do a
> trivial identity check (that decode of encode matches) with
>
>  ceph-dencoder object_info_t -i inputfile decode dump_json > /tmp/a
>  ceph-dencoder object_info_t -i inputfile decode encode decode dump_json > /tmp/b
>  cmp /tmp/a tmp/b
>
> Obviously that should always pass.  For testing cross-version encoding, we
> need a ceph-dencoder and a corpus of objects encoded for each version.
> Assuming you have that, you can (a) make sure we can decode anything from
> other versions without crashing, (b) compare the dumps between version and
> whitelist changes (e.g., when fields are added or removed).  You can also
> specify feature bits to test encoding for older versions, and take, say
> everything for the v0.42 corpus, encode for the v0.40 feature bits, and
> verify that the 0.40 version of ceph-dencoder can handle it.  And
> verify+whitelist diffs.
Awesome!

> How to build the per-version corpus?  We can write unit tests that
> explicitly generate interesting object instances.  That's tedious and time
> consuming, but probably best, since the developer knows what corner cases
> are interesting.

If we've got a single tool that can handle all the structs we can
presumably encode each struct from a given version?
Obviously constructing actual state would require more work but just
checking the default constructor version of each struct would go a
long, loooong way toward automating encode/decode checks.

-off to review the actual code,
Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html