On Fri, Sep 11, 2020 at 02:30:20PM +0200, Bouke Versteegh wrote: > While working on an implementation of git in dart, I've noticed a > possible error in the documentation. I hope I'm using the correct > channel to report this issue. This is definitely the right place. > On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object) > a diagram is shown, that explains the format of a Copy instruction > inside a Deltified pack entry: > > +----------+---------+---------+---------+---------+-------+-------+-------+ > | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 | > +----------+---------+---------+---------+---------+-------+-------+-------+ > > The documentation specifies that diagrams follow the RFC950 > (https://www.ietf.org/rfc/rfc1950.txt) format. I think you mean rfc1951 here, but yeah, it is clear in that document that bytes are in MSB order. And also that each of those boxes is a single byte. > That means that the left bit is MSB, and the right bit is LSB, so the > OpCode is MSB (1xxxxxxx), which is correct and matches other sources. Right, agreed. > It also would mean that offset 1-4 should be read from bit 7, 6, 5 and > 4 (i.e. 0x40, 0x20, 0x10, 0x08) I don't think this follows, though. "offset1" is a separate byte. We have a sequence of bytes here, some of which may be present. Their presence is determined by the bits "xxxxxxx", but the diagram does not say anything about the order. The paragraphs below do say: This is the instruction format to copy a byte range from the source object. It encodes the offset to copy from and the number of bytes to copy. Offset and size are in little-endian order. So "offset1" is the lowest byte of the offset value. We still don't know which bit in the instruction byte tells us whether it's there yet, though. The next paragraph says: All offset and size bytes are optional. This is to reduce the instruction size when encoding small offsets or sizes. The first seven bits in the first octet determines which of the next seven octets is present. If bit zero is set, offset1 is present. If bit one is set offset2 is present and so on. So bit zero of the "xxxxxxx" (the LSB) is offset1, and we have to check that first. Which I think is what happens in: > PARSE_CP_PARAM(0x01, cp_off, 0); That's reading the low bit of the command byte, and then reading the _next_ byte from data, and shifting it not at all (because it's the low byte of the offset). That macro looks like this: #define PARSE_CP_PARAM(bit, var, shift) do { \ if (cmd & (bit)) { \ if (data >= top) \ goto bad_length; \ var |= ((unsigned) *data++ << (shift)); \ } } while (0) > However, looking at the git source-code and other documentation (see > [1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and > the SIZE bits are stored right after MSB (opcode). I think both the code and diagram are right. It's just that the order of the follow-on bytes is not part of that diagram, but rather the explanation below it. > From this source code, I would conclude that the diagram should look > like this instead: > > +----------+---------+---------+---------+---------+-------+-------+-------+ > | 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 | > +----------+---------+---------+---------+---------+-------+-------+-------+ That would definitely be wrong. If there's an offset1 byte, it immediately follows the command byte. -Peff