Re: [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 21, 2021 at 06:18:46AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:10PM -0400, Taylor Blau wrote:
>
> > +An object is uniquely described by its bit position within a bitmap:
> > +
> > +	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
> > +	the __n__th object in pack order. For a function `offset` which maps
> > +	objects to their byte offset within a pack, pack order is defined as
> > +	follows:
> > +
> > +		o1 <= o2 <==> offset(o1) <= offset(o2)
> > +
> > +	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
> > +	__n__th object in MIDX order. With an additional function `pack` which
> > +	maps objects to the pack they were selected from by the MIDX, MIDX order
> > +	is defined as follows:
> > +
> > +		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
> > +
> > +	The ordering between packs is done lexicographically by the pack name,
> > +	with the exception of the preferred pack, which sorts ahead of all other
> > +	packs.
>
> This doesn't render well as asciidoc (the final paragraph is taken as
> more of the code block). But that is a problem through the whole file. I
> think we should ignore it for now, and worry about asciidoc-ifying the
> whole thing later, if we choose to.

Agreed; let's ignore it for now.

> > +	The ordering between packs is done lexicographically by the pack name,
> > +	with the exception of the preferred pack, which sorts ahead of all other
> > +	packs.
>
> Hmm, I'm not sure if this "lexicographically" part is true. Really we're
> building on the midx .rev format here. And that says "defined by the
> MIDX's pack list" (though I can't offhand remember if that is
> lexicographic, or if it is in the reverse-mtime order).
>
> At any rate, should we just be referencing the rev documentation?

The packs are listed in lex order in the MIDX, but that is so we can
binary search that list to determine whether a pack is included in the
MIDX or not.

I had to check, but we do use the lex order to resolve duplicate
objects, too. See (at the tip of this branch):

    QSORT(ctx.info, ctx.nr, pack_info_compare);

from within midx.c:write_midx_internal(). Here, ctx.info contains the
list of packs, and pack_info_compare is a thin wrapper around
strcmp()-ing the pack_name values of two packed_git structures.

Arguably, you'd get better EWAH compression of the bits between packs
if we sorted packs in reverse order according to their mtime. But I
suspect that it doesn't matter much in practice, since the number of
objects vastly outpaces the number of packs (but I haven't measured to
be certain, so take that with a grain of salt).

In any case, I think that you're right that adding too much detail hurts
us here, so we should really be mentioning the MIDX's .rev-file
documentation (unfortunately, we can't linkgit it, so mentioning it by
name will have to suffice). I plan to reroll with something like this on
top:

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 25221c7ec8..04b3ec2178 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -26,9 +26,8 @@ An object is uniquely described by its bit position within a bitmap:

 		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)

-	The ordering between packs is done lexicographically by the pack name,
-	with the exception of the preferred pack, which sorts ahead of all other
-	packs.
+	The ordering between packs is done according to the MIDX's .rev file.
+	Notably, the preferred pack sorts ahead of all other packs.

 The on-disk representation (described below) of a bitmap is the same regardless
 of whether or not that bitmap belongs to a packfile or a MIDX. The only

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux