Re: [PATCH 09/16] documentation: add documentation for the bitmap format

Colby Ranger <cranger@xxxxxxxxxx> · Wed, 26 Jun 2013 15:33:00 -0700



>> Pinning the bitmap index on the reverse index adds complexity (lookups
>> are two-step: first find the entry in the reverse index, and then find
>> the SHA1 in the index) and is measurably slower, in both loading and
>> lookup times. Since Git doesn't have a memory problem, it's very hard
>> to make an argument for design that is more complex and runs slower to
>> save memory.
>
> Sorting by SHA1 will generate a random distribution. This will require
> you to inflate the entire bitmap on every fetch request, in order to
> do the "contains" operation.  Sorting by pack offset allows us to
> inflate only the bits we need as we are walking the graph, since they
> are usually at the start of the bitmap.
>
> What is the general size in bytes of the SHA1 sorted bitmaps?  If they
> are much larger, the size of the bitmap has an impact on how fast you
> can perform bitwise operations on them, which is important for fetch
> when doing wants AND NOT haves.

Furthermore, JGit primarily operates on the bitmap representation,
rarely converting bitmap id -> SHA1 during clone. When the bitmap of
objects to include in the output pack contains all of the objects in
the bitmap'd pack, we only do the translation of the bitmap ids of new
objects, not in the bitmap index, and it is just a lookup in an array.
Those objects are put at the front of the stream. The rest of the
objects are streamed directly from the pack, with some header munging,
since it is guaranteed to be a fully connected pack. Most of the time
this works because JGit creates 2 packs during GC: a heads pack, which
is bitmap'd, and an everything else pack.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html