Re: [RFC/PATCH] define the way new representation types are encoded in the pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 27, 2011 at 23:04, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> In addition to four basic types (commit, tree, blob and tag), the pack
> stream can encode a few other "representation" types, such as REF_DELTA
> and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
> we do not have much room to add new representation types in place, but we
> do have one value reserved for future expansion.

We have 2 values reserved, 0 and 5.

> When bit 4-6 encodes type 5, the first byte is used this way:
>
>  - Bit 0-3 denotes the real "extended" representation type. Because types
>   0-7 can already be encoded without using the extended format, we can
>   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
>   type 11 = 3 + 8);
>
>  - Bit 4-6 has the value "5";
>
>  - Bit 7 is used to signal if the _third_ byte needs to be read for larger
>   size that cannot be represented with 8-bit.

This is very complicated. We don't need more complex logic in the pack
encoding. We _especially_ do not need yet another variant of how to
store a variable length integer in the pack file. I'm sorry, but we
already have two different variants and this just adds a third. It is
beyond crazy.

Last time (this is now years ago but whatever) Nico and I discussed
adding a new type to packs it was for the alternate tree encoding in
"pack v4". Trees happen so often that type code 5 is a good value to
use for these. Later you talked about using the extended type to store
a cattree blob thing, which would not appear nearly as often as a
normal directory listing type tree that was encoded using the pack v4
style encoding... I think saving type 5 for a small frequently
occurring type is a good thing.

> As it is unlikely for us to pack things that do not need to record any
> size, the second byte is always used in full to encode the low 8-bit of
> the size.
>
> I haven't started using type=8 and upwards for anything yet, but because
> we have only one "future expansion" value left, I want us to be extremely
> careful in order to avoid painting us into a corner that we cannot get out
> of, so I am sending this out early for a preliminary review.

I would have said something more like:

When bit 4-6 encodes "0", then:

- Bit 0-3 and bit 7 are used normally to encode a variable length
"size" integer. These may be 0 indicating no size information.

- 2nd-nth bytes store remaining "size" information, if bit 7 was set.

- The immediate next byte encodes the extended type. This type is
stored using the OFS_DELTA offset varint encoding, and thus may be
larger than 256 if we ever need it to be.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]