Re: [PATCH v2 2/8] builtin/multi-pack-index.c: support `--stdin-packs` mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 22, 2021 at 03:29:53PM -0700, Josh Steadmon wrote:
> Thanks for the series! I have a couple of questions:
>
>
> On 2021.09.15 14:24, Taylor Blau wrote:
> > To power a new `--write-midx` mode, `git repack` will want to write a
> > multi-pack index containing a certain set of packs in the repository.
> >
> > This new option will be used by `git repack` to write a MIDX which
> > contains only the packs which will survive after the repack (that is, it
> > will exclude any packs which are about to be deleted).
> >
> > This patch effectively exposes the function implemented in the previous
> > commit via the `git multi-pack-index` builtin. An alternative approach
> > would have been to call that function from the `git repack` builtin
> > directly, but this introduces awkward problems around closing and
> > reopening the object store, so the MIDX will be written out-of-process.
>
> Could you elaborate a bit on the "awkward problems" here? I'm afraid I'm
> missing the context here.

A variety of things can go wrong when the object store is closed and
re-opened in the same process. Many of the symptoms are described
beginning at this message:

  https://lore.kernel.org/git/YPf1m01mcdJ3HNBt@xxxxxxxxxxxxxxxxxxxxxxx/

and further down in the sub-thread. Many of those problems have been
resolved, but I'm not convinced that there aren't others lurking.

> > +static void read_packs_from_stdin(struct string_list *to)
> > +{
> > +	struct strbuf buf = STRBUF_INIT;
> > +	while (strbuf_getline(&buf, stdin) != EOF)
> > +		string_list_append(to, buf.buf);
> > +	string_list_sort(to);
> > +
> > +	strbuf_release(&buf);
> > +}
> > +
>
> I'm presuming that the packfile list is going to be generated
> automatically, but what happens if that becomes corrupt somehow, and we
> skip a packfile that should have been included? Will that cause
> incorrect behavior, or will we just miss out on some of the bitmap
> performance benefits?

A multi-pack bitmap can only refer to objects that are in a pack which
the repository's MIDX includes. So if we left off a pack from this list,
we'd be unable to cover that pack's objects in the resulting bitmap.
We'd also be unable to cover any objects which are reachable from the
missing pack's objects, since the set of objects in a bitmap must be
closed under reachability.

If, on the other hand, we read a line which does not correspond to any
pack, we'll simply ignore it. That's because we loop over the results of
get_all_packs() and try to find a match in this list instead of the
other way around.

We could mark the packs we found by abusing the string_list_item's util
pointer, but it's probably not worth it since this is mostly an internal
interface.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux