Re: [RFC/PATCH] define the way new representation types are encoded in the pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Thu, Oct 27, 2011 at 23:04, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > In addition to four basic types (commit, tree, blob and tag), the pack
> > stream can encode a few other "representation" types, such as REF_DELTA
> > and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
> > we do not have much room to add new representation types in place, but we
> > do have one value reserved for future expansion.
> 
> We have 2 values reserved, 0 and 5.
> 
> > When bit 4-6 encodes type 5, the first byte is used this way:
> >
> >  - Bit 0-3 denotes the real "extended" representation type. Because types
> >   0-7 can already be encoded without using the extended format, we can
> >   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
> >   type 11 = 3 + 8);
> >
> >  - Bit 4-6 has the value "5";
> >
> >  - Bit 7 is used to signal if the _third_ byte needs to be read for larger
> >   size that cannot be represented with 8-bit.
> 
> This is very complicated. We don't need more complex logic in the pack
> encoding. We _especially_ do not need yet another variant of how to
> store a variable length integer in the pack file. I'm sorry, but we
> already have two different variants and this just adds a third. It is
> beyond crazy.
> 
> Last time (this is now years ago but whatever) Nico and I discussed
> adding a new type to packs it was for the alternate tree encoding in
> "pack v4". Trees happen so often that type code 5 is a good value to
> use for these. Later you talked about using the extended type to store
> a cattree blob thing, which would not appear nearly as often as a
> normal directory listing type tree that was encoded using the pack v4
> style encoding... I think saving type 5 for a small frequently
> occurring type is a good thing.
> 
> > As it is unlikely for us to pack things that do not need to record any
> > size, the second byte is always used in full to encode the low 8-bit of
> > the size.
> >
> > I haven't started using type=8 and upwards for anything yet, but because
> > we have only one "future expansion" value left, I want us to be extremely
> > careful in order to avoid painting us into a corner that we cannot get out
> > of, so I am sending this out early for a preliminary review.
> 
> I would have said something more like:
> 
> When bit 4-6 encodes "0", then:
> 
> - Bit 0-3 and bit 7 are used normally to encode a variable length
> "size" integer. These may be 0 indicating no size information.
> 
> - 2nd-nth bytes store remaining "size" information, if bit 7 was set.
> 
> - The immediate next byte encodes the extended type. This type is
> stored using the OFS_DELTA offset varint encoding, and thus may be
> larger than 256 if we ever need it to be.

I'd say it is just a byte.  No encoding needed.  Let's not be silly 
about it.  If we really have more than 255 object types one day (and I 
really hope this will never happen) then the value 0 in that byte could 
indicate yet another extended object type encoding.  But I truly hope 
we'll have pack v9 or v10 by then and that we'll have obsoleted the 
current 3-bit encoding completely at that point anyway.

For the record, I spent around 20 hours working on pack v4 while in the 
Caribbeans for a week last winter as I said I would.  Maybe I'll repeat 
the operation this year.


Nicolas

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]