On Mon, 26 Feb 2007, Shawn O. Pearce wrote: > Nicolas Pitre <nico@xxxxxxx> wrote: > > Actually I've been thinking about another format already. > > > > What about keeping the pack offset as 32 bits like it is today, but for > > index v2 if the top bit is set then this become an index into another > > table containing 64-bit offsets as needed. This way there is no waste > > of space for most projects where the pack has yet to reach the 2GB limit > > for many years to come. > > Actually Troy's patch tries to do this by using the current format > and only switching to the new one if the packfile exceeds 4 GiB. > Rather smart. Yes I saw the patch. But what I propose is different. In fact I'd require far less changes to the existing code. The idea is to continue to store a 32-bit value along with the SHA1 just like we do today. Then, appended to that would be another table containing a list of 64-bit offsets. Now if the offset stored in the index is smaller than 2GB you store it as we do today. If it is >= 2GB then a 64-bit index would be added to the extra offset table and the 32-bit entry along with the SHA1 would be an index into that second table instead, with the top bit set to distinguish it from a normal 32-bit offset (actually 31 bits). So for offsets larger than 31 bits then they have an additional level of indirection. The code to implement this would be minimal. And since objects placed at the end of a pack (those more likely to incure the indirection overhead) are further back in history they won't get accessed very often anyway. Then nothing prevents us from inserting the next-object-index table in between (its size is known while the 64-bit offset one may vary) then the code that doesn't care about it need no look at it. > One thought I had here was to expand the fan-out table from 1<<8 > entries to 1<<16 entries, then store only the low 18 bytes of > the SHA-1. We would have another 2 bytes worth of space to store > the offset, pushing our total offset up to 48 bits. That would penalize small packs a lot. the index would always start from 256KB in size. With a pack of 100 objects (our current treshold for keeping a pack) that means a 258KB index file. Currently the index file for a 100-object pack is 3.4KB. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html