>> Pinning the bitmap index on the reverse index adds complexity (lookups >> are two-step: first find the entry in the reverse index, and then find >> the SHA1 in the index) and is measurably slower, in both loading and >> lookup times. Since Git doesn't have a memory problem, it's very hard >> to make an argument for design that is more complex and runs slower to >> save memory. > > Sorting by SHA1 will generate a random distribution. This will require > you to inflate the entire bitmap on every fetch request, in order to > do the "contains" operation. Sorting by pack offset allows us to > inflate only the bits we need as we are walking the graph, since they > are usually at the start of the bitmap. > > What is the general size in bytes of the SHA1 sorted bitmaps? If they > are much larger, the size of the bitmap has an impact on how fast you > can perform bitwise operations on them, which is important for fetch > when doing wants AND NOT haves. Furthermore, JGit primarily operates on the bitmap representation, rarely converting bitmap id -> SHA1 during clone. When the bitmap of objects to include in the output pack contains all of the objects in the bitmap'd pack, we only do the translation of the bitmap ids of new objects, not in the bitmap index, and it is just a lookup in an array. Those objects are put at the front of the stream. The rest of the objects are streamed directly from the pack, with some header munging, since it is guaranteed to be a fully connected pack. Most of the time this works because JGit creates 2 packs during GC: a heads pack, which is bitmap'd, and an everything else pack. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html