>>> +Deltified representation >> >> Does this refer to OFS delta as well as REF deltas? > > Yes. Both OFS and REF deltas have the same "body" which is what this > part is about. The differences between OFS and REF deltas are not > described (in fact I don't think we describe what OFS and REF deltas > are at all). Maybe we should? > >>> is a sequence of one byte command optionally >>> +followed by more data for the command. The following commands are >>> +recognized: >> >> So a Deltified representation of an object is a 6 or 7 in the 3 bit type >> and then the length. Then a command is shown how to construct >> the object based on other objects. Can there be more commands? >> >>> +- If bit 7 is set, the remaining bits in the command byte specifies >>> + how to extract copy offset and size to copy. The following must be >>> + evaluated in this exact order: >> >> So there are 2 modes, and the high bit indicates which mode is used. >> You start describing the more complicated mode first, >> maybe give names to both of them? "direct copy" (below) and >> "compressed copy with offset" ? > > I started to update this more because even this text is hard to get > even to me. So let's get the background first. > > We have a source object somewhere (the object name comes from ofs/ref > delta's header), basically we have the whole content. This delta > thingy tells us how to use that source object to create a new (target) > object. > > The delta is actually a sequence of instructions (of variable length). The previous paragraph and this sentence are great for my understanding. thanks! (Maybe keep it in a similar form around?) > One is for copying from the source object. ok that makes sense. I can think of it as a "HTTP range request", just optimized for packfiles and the source is inside the same pack. So it would say "Goto object <sha1> and copy bytes 13-168 here" > The other copies from the > delta itself itself means the same object here, that we are describing here? or does it mean other deltas? > (e.g. this is new data in the target which is not > available anywhere in the source object to copy from). > > The instruction looks like this > > bit 0 1 2 3 4 5 6 > +----------+--------+--------+--------+--------+------+------+------+ > | 1xxxxxxx | offset | offset | offset | offset | size | size | size | > +----------+--------+--------+--------+--------+------+------+------+ > > Here you can see it in its full form, each box represents a byte. The > first byte has bit 7 set as mentioned. We can see here that offsets > (where to copy from in the source object) takes 4 bytes and size (how > many bytes to copy) takes 3. Offset size size is in LSB order. > > The "xxxxxxx" part lets us shrink this down. .. by indicating how much prefix we can skip and assume it be all zero(?) > If the offset can fit in > 16 bits, there's no reason to waste the last two bytes describing > zero. Each 'x' marks whether the corresponding byte is present. So for a full instruction (as above), we'd have to 1 1111 111 <4 bytes offset> <3 bytes size> for smaller instructions we have 1 1100 100 <2 bytes offset> <1 byte size> and here the offset is in range 0..64k and the size is 1-255 or 0x10000 ? Modes to skip bytes in between are not allowed, e.g. 1 1101 101 < 3 bytes of offsets> <2 bytes of size> and the missing bytes would be assumed to be 0? > The > bit number is in the first row. So if you have offset 255 and size 1, > the instruction is three bytes 10010001b, 255, Oh it is the other way round, the size will be just one byte, indicating we can have a range of 1-255 or 0x10000 and an offset of 0..255. > > I think this is a corner case in this format. I think Nico meant to > specify consecutive bytes: if size is 2 bytes then you have to specify > _both_ of them even if the first byte could be zero and omitted. So it is not a mutually exclusive group, but a sequence (similar as in git-bisect), where we start with 0 and end with exactly one edge in between (sort of, we can also start with 1, then we have to have all 1s) > The implementation detail is, if bit 6 is set but bit 4 is not, then > the size value is pretty much random. It's only when bit 4 is set that > we first clear out "size" and start adding bits to it. That sounds similar to what I spelled out above. Thanks for taking on the documentation here. The box with numbers really helped me! Stefan