Re: [RFC] Packing large repositories

Nicolas Pitre <nico@xxxxxxx> · Sat, 31 Mar 2007 14:51:23 -0400 (EDT)

On Sat, 31 Mar 2007, Geert Bosch wrote:

> I've been working on a design for a new index file format, and I'll
> include my current draft at the end of this message. It is not finished,
> and I've not written much code yet, but I'd like to prevent duplicated
> work and give others the opportunity to borrow from the ideas so far.

I'm glad you posted this as I intended to work on the index today.

> My main focus is to minimize the amount of data accessed in a random
> fashion, allow for multiple packs and virtually unlimited scaling.
> This is achieved through a fan-out table that grows with the number
> of items in the pack, keeping the fan-out table overhead under 2 bits
> per object while encoding a growing prefix of the object ID. This
> frees up bits for a pack number. The pack number is either used to
> find the correct pack, or alternatively for the right chunk of a
> very large pack file.

And how do you determine that pack number?

The index should be tied to the pack it is indexing and nothing else 
like the presence of other packs if possible.  When given a 20-byte SHA1 
you don't know what packnumber that corresponds to anyway.

> The linear search through packs is very inefficient
> with large numbers of packs. Having packs much larger
> than a GB is also problematic, due to this as repacking
> and otherwise modifying packs gets very expensive.

Repacking is not a frequent operation, and it can be done 
asynchronously.  So I don't consider that to be a big issue if it is 
expensive.  It is already extremely expensive compared to other GIT 
operations.  So it is better to have fewer and bigger packs to speed up 
runtime object searching than making repack less expensive with more 
smaller packs.

> Another issue is that binary search requires many
> semi-random accesses spread over the index. Finally,
> most of the information actually read consists of
> SHA1's that are never needed.

Of course the pack v4 design allow for bypassing all this altogether in 
most cases.  But it doesn't eliminate the need for an index in every 
cases.  So let's optimize the index first, especially since the pressing 
need now is to lift the 4G pack size limit. And pack v4 will come after 
that.

Just to say that the index doesn't need to be the ultimate thing as pack 
v4 will scale much better than any index format could ever do due to its 
direct object reference design.

However, a big detail that would greatly help pack v4 is to have the 
SHA1 table together in one block with no other fields inserted between 
entries.  Then the only difference between current packs and pack v4 
would be that the SHA1 table will be located in the pack itself instead 
of the index (pack v4 will carry the sorted SHA1 table already so the 
index won't have to store it).

> This proposed pack index format does not focus on reducing
> used disk space, but instead aims to reduce the number
> of blocks that needs to be read to perform lookups.
> This is done using three techniques:
>   1) scale the number of fan-out bins with the number
>      of objects in the index, keeping the expected
>      number of objects in each bin constant
>   2) take advantage of 1) by only storing a few bits
>      following the common prefix in the main lookup table
>      as a discriminant. Store the rest of the SHA1 and
>      the pack offest in a separate, parallel, object table.
>   3) Instead of repeating the variable-length common prefix
>      and the discriminant, use the space for the prefix
>      for a pack identifier and omit the discriminant altogether.

I think (2) conflicts with my goal of having the SHA1 table independent. 
It is really important for future compatibility with pack v4 that the 
SHA1 table remains a sorted list of contiguous 20-byte records.

I really like your adaptative fanout idea though.  So it could be 
possible to have something based on the largest common SHA1 prefix 
within a pack for example.  Or maybe it could be made multi level for an 
arbitrary portion of the largest common prefix.

Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html