Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 26, 2021 at 02:59:02PM -0400, Taylor Blau wrote:
> On Fri, Jul 23, 2021 at 04:29:37AM -0400, Jeff King wrote:
> > On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote:
> >
> > > > > This avoids a problem that would arise in subsequent patches due to the
> > > > > combination of 'git repack' reopening the object store in-process and
> > > > > the multi-pack index code not checking whether a pack already exists in
> > > > > the object store when calling add_pack_to_midx().
> > > > >
> > > > > This would ultimately lead to a cycle being created along the
> > > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > > > > has hard-to-debug downstream effects like saying a bitmap can't be
> > > > > loaded for a pack because one already exists (for the same pack).
> > > >
> > > > I'm not sure I completely understand the bug that this causes.
> > >
> > > Off-hand, I can't quite remember either. But it is important; I do have
> > > a distinct memory of dropping this patch and then watching a 'git repack
> > > --write-midx' (that option will be introduced in a later series) fail
> > > horribly.
> > >
> > > If I remember correctly, the bug has to do with loading a MIDX twice in
> > > the same process. When we call add_packed_git() from within
> > > prepare_midx_pack(), we load the pack without caring whether or not it's
> > > already loaded. So loading a MIDX twice in the same process will fail.
> > >
> > > So really I think that this is papering over that bug: we're just
> > > removing one of the times that we happened to load a MIDX from during
> > > the writing phase.
> >
> > Hmm, after staring at this for a bit, I've unconfused and re-confused
> > myself several times.
> >
> > Here are some interesting bits:
> >
> >   - calling load_multi_pack_index() directly creates a new midx object.
> >     None of its m->packs[] array will be filled in. Nor is it reachable
> >     as r->objects->multi_pack_index.
> >
> >   - in using that midx, we end up calling prepare_midx_pack() for
> >     various packs, which creates a new packed_git struct and adds it to
> >     r->objects->packed_git (via install_packed_git()).
> >
> > So that's a bit weird already, because we have packed_git structs in
> > r->objects that came from a midx that isn't r->objects->multi_pack_index.
> > And then if we later call prepare_multi_pack_index(), for example as
> > part of a pack reprepare, then we'd end up with duplicates.
>
> Ah, this jogged my memory: this is a relic from when we generated MIDX
> bitmaps in-process with the rest of the `repack` code. And when we did
> that, we did have to call `reprepare_packed_git()` after writing the new
> packs but before moving them into place.

Actually, I take that back. You were right from the start: the way the
code is written we *can* end up calling both:

  - load_multi_pack_index, from write_midx_internal(), which sets up a
    MIDX, but does not update r->objects->multi_pack_index to point at
    it.

  - ...and prepare_multi_pack_index_one (via prepare_bitmap_git ->
    open_bitmap -> open_midx_bitmap -> get_multi_pack_index ->
    prepare_packed_git) which *also* creates a new MIDX, *and*
    updates the_repository->objects->multi_pack_index to point at it.

(The latter codepath is from the check in write_midx_internal() to see
if we already have a MIDX bitmap when the MIDX we are trying to write
already exists on disk.)

So in this scenario, we have two copies of the same MIDX open, and the
repository's single pack is opened in one of the MIDXs, but not both.
One copy of the pack is pointed at via r->objects->packed_git. Then when
we fall back to open_pack_bitmap(), we call get_all_packs(), which calls
prepare_midx_pack(), which installs the second MIDX's copy of the same
pack into the r->objects->packed_git, and we have a cycle.

I think there are a few ways to fix this bug. The most obvious is to
make install_packed_git() check for the existence of the pack in the
hashmap of installed packs before (re-)installing it. But that could be
quadratic if the hashmap has too many collisions (and the lookup tends
towards being linear in the number of keys rather than constant).

But I think that a more straightforward way would be to open the MIDX we
use when generating the MIDX with prepare_multi_pack_index_one() instead
of load_multi_pack_index() so that the resulting MIDX is pointed at by
r->objects->multi_pack_index. That would prevent the latter call from
deep within the callstack of prepare_bitmap_git() from opening another
copy and then (mistakenly) re-installing the same pack twice.

Thoughts?

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux