Re: Docs: possible incorrect diagram of Delta Copy instruction

Bouke Versteegh <info@xxxxxxxxxxxxxxxxx> · Sat, 12 Sep 2020 13:21:57 +0200

Hi Jeff,

Thank you for your elaborate answer. So it turns out I misinterpreted
the diagram to represent the 8 opcode BITS rather than the whole
sequence of BYTES.
In hindsight there were some hints that I've overlooked, like the
first block containing 1xxxxxxx, which shows already its a byte, not a
bit, and the explanation below.

Thanks again for taking the time to explain!

Kind regards, Bouke

On Fri, 11 Sep 2020 at 16:33, Jeff King <peff@xxxxxxxx> wrote:
>
> On Fri, Sep 11, 2020 at 02:30:20PM +0200, Bouke Versteegh wrote:
>
> > While working on an implementation of git in dart, I've noticed a
> > possible error in the documentation. I hope I'm using the correct
> > channel to report this issue.
>
> This is definitely the right place.
>
> > On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object)
> > a diagram is shown, that explains the format of a Copy instruction
> > inside a Deltified pack entry:
> >
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> > | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> >
> > The documentation specifies that diagrams follow the RFC950
> > (https://www.ietf.org/rfc/rfc1950.txt) format.
>
> I think you mean rfc1951 here, but yeah, it is clear in that document
> that bytes are in MSB order. And also that each of those boxes is a
> single byte.
>
> > That means that the left bit is MSB, and the right bit is LSB, so the
> > OpCode is MSB (1xxxxxxx), which is correct and matches other sources.
>
> Right, agreed.
>
> > It also would mean that offset 1-4 should be read from bit 7, 6, 5 and
> > 4 (i.e. 0x40, 0x20, 0x10, 0x08)
>
> I don't think this follows, though. "offset1" is a separate byte. We
> have a sequence of bytes here, some of which may be present. Their
> presence is determined by the bits "xxxxxxx", but the diagram does not
> say anything about the order.
>
> The paragraphs below do say:
>
>   This is the instruction format to copy a byte range from the source
>   object. It encodes the offset to copy from and the number of bytes to
>   copy. Offset and size are in little-endian order.
>
> So "offset1" is the lowest byte of the offset value. We still don't know
> which bit in the instruction byte tells us whether it's there yet,
> though. The next paragraph says:
>
>   All offset and size bytes are optional. This is to reduce the
>   instruction size when encoding small offsets or sizes. The first seven
>   bits in the first octet determines which of the next seven octets is
>   present. If bit zero is set, offset1 is present. If bit one is set
>   offset2 is present and so on.
>
> So bit zero of the "xxxxxxx" (the LSB) is offset1, and we have to check
> that first. Which I think is what happens in:
>
> >   PARSE_CP_PARAM(0x01, cp_off, 0);
>
> That's reading the low bit of the command byte, and then reading the
> _next_ byte from data, and shifting it not at all (because it's the low
> byte of the offset). That macro looks like this:
>
>   #define PARSE_CP_PARAM(bit, var, shift) do { \
>                           if (cmd & (bit)) { \
>                                   if (data >= top) \
>                                           goto bad_length; \
>                                   var |= ((unsigned) *data++ << (shift)); \
>                           } } while (0)
>
> > However, looking at the git source-code and other documentation (see
> > [1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and
> > the SIZE  bits are stored right after MSB (opcode).
>
> I think both the code and diagram are right. It's just that the order of
> the follow-on bytes is not part of that diagram, but rather the
> explanation below it.
>
> > From this source code, I would conclude that the diagram should look
> > like this instead:
> >
> > +----------+---------+---------+---------+---------+-------+-------+-------+
> > | 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 |
> > +----------+---------+---------+---------+---------+-------+-------+-------+
>
> That would definitely be wrong. If there's an offset1 byte, it
> immediately follows the command byte.
>
> -Peff