Re: [PATCH] Support 64-bit indexes for pack files.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 26 Feb 2007, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@xxxxxxx> wrote:
> > Actually I've been thinking about another format already.
> > 
> > What about keeping the pack offset as 32 bits like it is today, but for 
> > index v2 if the top bit is set then this become an index into another 
> > table containing 64-bit offsets as needed.  This way there is no waste 
> > of space for most projects where the pack has yet to reach the 2GB limit 
> > for many years to come.
> 
> Actually Troy's patch tries to do this by using the current format
> and only switching to the new one if the packfile exceeds 4 GiB.
> Rather smart.

Yes I saw the patch.  But what I propose is different.  In fact I'd 
require far less changes to the existing code.  The idea is to continue 
to store a 32-bit value along with the SHA1 just like we do today.  
Then, appended to that would be another table containing a list of 
64-bit offsets.

Now if the offset stored in the index is smaller than 2GB you store it 
as we do today.  If it is >= 2GB then a 64-bit index would be added to 
the extra offset table and the 32-bit entry along with the SHA1 would be 
an index into that second table instead, with the top bit set to 
distinguish it from a normal 32-bit offset (actually 31 bits).  So for 
offsets larger than 31 bits then they have an additional level of 
indirection.

The code to implement this would be minimal.  And since objects placed 
at the end of a pack (those more likely to incure the indirection 
overhead) are further back in history they won't get accessed 
very often anyway.

Then nothing prevents us from inserting the next-object-index table in 
between (its size is known while the 64-bit offset one may vary) then 
the code that doesn't care about it need no look at it. 

> One thought I had here was to expand the fan-out table from 1<<8
> entries to 1<<16 entries, then store only the low 18 bytes of
> the SHA-1.  We would have another 2 bytes worth of space to store
> the offset, pushing our total offset up to 48 bits.

That would penalize small packs a lot.  the index would always start 
from 256KB in size.  With a pack of 100 objects (our current treshold 
for keeping a pack) that means a 258KB index file.  Currently the index 
file for a 100-object pack is 3.4KB.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]