Hi Jeff, Thank you for your elaborate answer. So it turns out I misinterpreted the diagram to represent the 8 opcode BITS rather than the whole sequence of BYTES. In hindsight there were some hints that I've overlooked, like the first block containing 1xxxxxxx, which shows already its a byte, not a bit, and the explanation below. Thanks again for taking the time to explain! Kind regards, Bouke On Fri, 11 Sep 2020 at 16:33, Jeff King <peff@xxxxxxxx> wrote: > > On Fri, Sep 11, 2020 at 02:30:20PM +0200, Bouke Versteegh wrote: > > > While working on an implementation of git in dart, I've noticed a > > possible error in the documentation. I hope I'm using the correct > > channel to report this issue. > > This is definitely the right place. > > > On [pack format](https://git-scm.com/docs/pack-format#_instruction_to_copy_from_base_object) > > a diagram is shown, that explains the format of a Copy instruction > > inside a Deltified pack entry: > > > > +----------+---------+---------+---------+---------+-------+-------+-------+ > > | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 | > > +----------+---------+---------+---------+---------+-------+-------+-------+ > > > > The documentation specifies that diagrams follow the RFC950 > > (https://www.ietf.org/rfc/rfc1950.txt) format. > > I think you mean rfc1951 here, but yeah, it is clear in that document > that bytes are in MSB order. And also that each of those boxes is a > single byte. > > > That means that the left bit is MSB, and the right bit is LSB, so the > > OpCode is MSB (1xxxxxxx), which is correct and matches other sources. > > Right, agreed. > > > It also would mean that offset 1-4 should be read from bit 7, 6, 5 and > > 4 (i.e. 0x40, 0x20, 0x10, 0x08) > > I don't think this follows, though. "offset1" is a separate byte. We > have a sequence of bytes here, some of which may be present. Their > presence is determined by the bits "xxxxxxx", but the diagram does not > say anything about the order. > > The paragraphs below do say: > > This is the instruction format to copy a byte range from the source > object. It encodes the offset to copy from and the number of bytes to > copy. Offset and size are in little-endian order. > > So "offset1" is the lowest byte of the offset value. We still don't know > which bit in the instruction byte tells us whether it's there yet, > though. The next paragraph says: > > All offset and size bytes are optional. This is to reduce the > instruction size when encoding small offsets or sizes. The first seven > bits in the first octet determines which of the next seven octets is > present. If bit zero is set, offset1 is present. If bit one is set > offset2 is present and so on. > > So bit zero of the "xxxxxxx" (the LSB) is offset1, and we have to check > that first. Which I think is what happens in: > > > PARSE_CP_PARAM(0x01, cp_off, 0); > > That's reading the low bit of the command byte, and then reading the > _next_ byte from data, and shifting it not at all (because it's the low > byte of the offset). That macro looks like this: > > #define PARSE_CP_PARAM(bit, var, shift) do { \ > if (cmd & (bit)) { \ > if (data >= top) \ > goto bad_length; \ > var |= ((unsigned) *data++ << (shift)); \ > } } while (0) > > > However, looking at the git source-code and other documentation (see > > [1] [2]), I see that offset [1-4] are read from the LOWEST 4 bits, and > > the SIZE bits are stored right after MSB (opcode). > > I think both the code and diagram are right. It's just that the order of > the follow-on bytes is not part of that diagram, but rather the > explanation below it. > > > From this source code, I would conclude that the diagram should look > > like this instead: > > > > +----------+---------+---------+---------+---------+-------+-------+-------+ > > | 1xxxxxxx | size3 | size 2 | size1 | offset4 | offset3 | offset2 | offset1 | > > +----------+---------+---------+---------+---------+-------+-------+-------+ > > That would definitely be wrong. If there's an offset1 byte, it > immediately follows the command byte. > > -Peff