Re: [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 21, 2021 at 01:53:40PM -0400, Taylor Blau wrote:

> > > +	The ordering between packs is done lexicographically by the pack name,
> > > +	with the exception of the preferred pack, which sorts ahead of all other
> > > +	packs.
> >
> > Hmm, I'm not sure if this "lexicographically" part is true. Really we're
> > building on the midx .rev format here. And that says "defined by the
> > MIDX's pack list" (though I can't offhand remember if that is
> > lexicographic, or if it is in the reverse-mtime order).
> >
> > At any rate, should we just be referencing the rev documentation?
> 
> The packs are listed in lex order in the MIDX, but that is so we can
> binary search that list to determine whether a pack is included in the
> MIDX or not.
> 
> I had to check, but we do use the lex order to resolve duplicate
> objects, too. See (at the tip of this branch):
> 
>     QSORT(ctx.info, ctx.nr, pack_info_compare);
> 
> from within midx.c:write_midx_internal(). Here, ctx.info contains the
> list of packs, and pack_info_compare is a thin wrapper around
> strcmp()-ing the pack_name values of two packed_git structures.

Ah, OK, thanks for checking.

> Arguably, you'd get better EWAH compression of the bits between packs
> if we sorted packs in reverse order according to their mtime. But I
> suspect that it doesn't matter much in practice, since the number of
> objects vastly outpaces the number of packs (but I haven't measured to
> be certain, so take that with a grain of salt).

Agreed, especially when the intended use is with geometric repacking to
keep reasonable-sized packs.

Either way, I think heuristics to optimize the pack ordering can easily
come on top later. Let's keep this series focused on the fundamentals of
having midx bitmaps at all.

> In any case, I think that you're right that adding too much detail hurts
> us here, so we should really be mentioning the MIDX's .rev-file
> documentation (unfortunately, we can't linkgit it, so mentioning it by
> name will have to suffice). I plan to reroll with something like this on
> top:
> 
> diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> index 25221c7ec8..04b3ec2178 100644
> --- a/Documentation/technical/bitmap-format.txt
> +++ b/Documentation/technical/bitmap-format.txt
> @@ -26,9 +26,8 @@ An object is uniquely described by its bit position within a bitmap:
> 
>  		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
> 
> -	The ordering between packs is done lexicographically by the pack name,
> -	with the exception of the preferred pack, which sorts ahead of all other
> -	packs.
> +	The ordering between packs is done according to the MIDX's .rev file.
> +	Notably, the preferred pack sorts ahead of all other packs.
> 
>  The on-disk representation (described below) of a bitmap is the same regardless
>  of whether or not that bitmap belongs to a packfile or a MIDX. The only

Thanks, that looks much better. We can't linkgit, but we only build HTML
for these. So just a link to pack-format.html would work, as they'd
generally be found side-by-side in the filesystem. But since this
doesn't even really render as asciidoc, I'm not sure I care either way.
(Obviously we could also mention pack-format.txt by name, but it's
probably already obvious-ish to a human that this is where you'd find
information on the pack .rev format).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux