On Wed, Feb 10, 2021 at 09:48:20PM -0500, Derrick Stolee wrote: > nit: use multi-pack-index or MIDX, not lower-case 'midx'. Thanks. > > +Crucially, the objects' positions within this pseudo-pack are the same > > +as their bit positions in a multi-pack reachability bitmap. > > + > > +As a motivating example, consider the multi-pack reachability bitmap > > +(which does not yet exist, but is what we are building towards here). We > > +need each bit to correspond to an object covered by the midx, and we > > +need to be able to convert bit positions back to index positions (from > > +which we can get the oid, etc). > > These paragraphs are awkward. Instead of operating in the hypothetical > world of reachability bitmaps, focus on the fact that bitmaps need > a bidirectional mapping between "bit position" and an object ID. Hmm. I could buy that these paragraphs are awkward, but I'm not sure that what you proposed makes it less so. I may be a bad person to judge what you wrote, since I am familiar with the details of what it's describing. But my thoughts on that second and third paragraph are basically: - define the valid orderings we might consider objects in a MIDX by, indicating which of those orderings we're going to use for multi-pack bitmaps - motivate the need for a mapping between lexicographic order and pseudo-pack order > Here is an attempt to reword some of the context you are using here. > Feel free to take as much or as little as you want. > > The multi-pack-index stores the object IDs in lexicographical order > (lex-order) to allow binary search. To allow compressible reachability > bitmaps to pair with a multi-pack-index, a different ordering is > required. When paired with a single packfile, the order used is the > object order within the packfile (called the pack-order). Construct > a "pseudo-pack" by concatenating all tracked packfiles in the > multi-pack-index. We now need a mapping between the lex-order and the > pseudo-pack-order. I struggled with what you wrote because I couldn't seem to neatly place/replace that paragraph in with the existing text without referring to yet-undefined concepts. Maybe the confusion lies in the fact that we stray too far from the point in the second and third paragraphs. What if we reordered the second, third, and fourth paragraph like this: Instead of mapping between offset, pack-, and index position, this reverse index maps between an object's position within the MIDX, and that object's position within a pseudo-pack that the MIDX describes. To clarify these three orderings, consider a multi-pack reachability bitmap (which does not yet exist, but is what we are building towards here). Each bit needs to correspond to an object in the MIDX, and so we need an efficient mapping from bit position to MIDX position. One solution is to let bits occupy the same position in the oid-sorted index stored by the MIDX. But because oids are effectively random, there resulting reachability bitmaps would have no locality, and thus compress poorly. (This is the reason that single-pack bitmaps use the pack ordering, and not the .idx ordering, for the same purpose.) So we'd like to define an ordering for the whole MIDX based around pack ordering, which has far better locality (and thus compresses more efficiently). We can think of a pseudo-pack created by the concatenation of all of the packs in the MIDX. E.g., if we had a MIDX with three packs (a, b, c), with 10, 15, and 20 objects respectively, we can imagine an ordering of the objects like: > [snip] > > The rest of these details make sense and sufficiently motivate the > ordering, once the concept is clear. > > Thanks, > -Stolee Thanks, Taylor