Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing

Taylor Blau <me@xxxxxxxxxxxx> · Tue, 27 Jul 2021 16:05:39 -0400

On Tue, Jul 27, 2021 at 01:55:52PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 01:47:40PM -0400, Taylor Blau wrote:
>
> > > BTW, yet another weirdness: close_object_store() will call close_midx()
> > > on the outermost midx struct, ignoring o->multi_pack_index->next
> > > entirely. So that's a leak, but also means we may not be closing the
> > > midx we're interested in (since write_midx_internal() takes an
> > > object-dir parameter, and we could be pointing to some other
> > > object-dir's midx).
> >
> > Yuck, this is a mess. I'm tempted to say that we should be closing the
> > MIDX that we're operating on inside of write_midx_internal() so we can
> > write, but then declaring the whole object store to be bunk and calling
> > close_object_store() before leaving the function. Of course, one of
> > those steps should be closing the inner-most MIDX before closing the
> > next one and so on.
>
> That gets even weirder when you look at other callers of
> write_midx_internal(). For instance, expire_midx_packs() is calling
> load_multi_pack_index() directly, and then passing it to
> write_midx_internal().
>
> So closing the whole object store there is likewise weird.
>
> I actually think having write_midx_internal() open up a new midx is
> reasonable-ish. It's just that:
>
>   - it's weird when it stuffs duplicate packs into the
>     r->objects->packed_git list. But AFAICT that's not actually hurting
>     anything?

It is hurting us when we try to write a MIDX bitmap, because we try to
see if one already exists. And to do that, we call prepare_bitmap_git(),
which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
list. Critically, prepare_bitmap_git() errors out if it is called with a
bitmap_git that has a non-NULL `->pack` pointer.

So they aren't a cycle in the sense that `p->next == p`, but it does
cause problems for us nonetheless.

I stepped away from my computer for an hour or so and thought about
this, and I think that the solution is two-fold:

  - We should be more careful about freeing up the ->next pointers of a
    MIDX, and releasing the memory we allocated to hold each MIDX struct
    in the first place.

  - We should always be operating on the repository's
    r->objects->multi_pack_index, or any other MIDX that can be reached
    via walking the `->next` pointers. If we do that consistently, then
    we'll only have at most one instance of a MIDX struct corresponding
    to each MIDX file on disk.

In the reroll that I'll send shortly, those are:

  - https://github.com/ttaylorr/git/commit/61a617715f3827401522c7b08b50bb6866f2a4e9
  - https://github.com/ttaylorr/git/commit/fd15ecf47c57ce4ff0d31621c2c9f61ff7a74939

respectively. It resolves my issue locally which was that I was
previously unable to run:

    GIT_TEST_MULTI_PACK_INDEX=1 \
    GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 \
      t7700-repack.sh

without t7700.13 failing on me (it previously complained about a
"duplicate" .bitmap file, which is a side-effect of placing duplicates
in the packed_git list, not a true duplicate .bitmap on disk).

I'm waiting on a CI run [1], but I'm relatively happy with the result. I
think we could do something similar to the MIDX code like we did to the
commit-graph code in [2], but I'm reasonably happy with where we are
now.

Thanks,
Taylor

[1]: https://github.com/ttaylorr/git/actions/runs/1072513087
[2]: https://lore.kernel.org/git/cover.1580424766.git.me@xxxxxxxxxxxx/