Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 28, 2021 at 01:46:21PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 04:05:39PM -0400, Taylor Blau wrote:
>
> > > I actually think having write_midx_internal() open up a new midx is
> > > reasonable-ish. It's just that:
> > >
> > >   - it's weird when it stuffs duplicate packs into the
> > >     r->objects->packed_git list. But AFAICT that's not actually hurting
> > >     anything?
> >
> > It is hurting us when we try to write a MIDX bitmap, because we try to
> > see if one already exists. And to do that, we call prepare_bitmap_git(),
> > which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
> > list. Critically, prepare_bitmap_git() errors out if it is called with a
> > bitmap_git that has a non-NULL `->pack` pointer.
>
> It doesn't error out. It does produce a warning(), though, if it ignores
> a bitmap (and that warning is doubly confusing because it is ignoring
> bitmap X because it has already loaded and will use that exact same X!).
>
> This causes t7700.13 to fail because it is being picky about stderr
> being empty.

Right, sorry for suggesting that the error was more severe than it
actually is.

> So the overall behavior is correct, but I agree it's sufficiently ugly
> that we should make sure it doesn't happen.

100% agreed. I think the most unfortunate thing is the state of
r->objects->packed_git, since it's utterly bizarre to have the same pack
opened twice and have both of those copies in the list. That is
definitely worth preventing.

>   Side note: IMHO the "check all packs to see if there are any other
>   bitmaps to warn about" behavior is kind of pointless, and we should
>   consider just returning as soon as we have one. This is already
>   somewhat the case after your midx-bitmap patches, as we will not even
>   bother to look for a pack bitmap after finding a midx bitmap. That is
>   a good thing, because it means you can keep pack bitmaps around for
>   flexibility. But let's leave any changes to the pack-only behavior out
>   of this series for simplicity.

I agree. I'd be in favor of something like

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f599646e19..5450ffb04c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -378,13 +378,6 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}

-	if (bitmap_git->pack || bitmap_git->midx) {
-		/* ignore extra bitmap file; we can only handle one */
-		warning("ignoring extra bitmap file: %s", packfile->pack_name);
-		close(fd);
-		return -1;
-	}
-
 	bitmap_git->pack = packfile;
 	bitmap_git->map_size = xsize_t(st.st_size);
 	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -465,16 +458,15 @@ static int open_pack_bitmap(struct repository *r,
 			    struct bitmap_index *bitmap_git)
 {
 	struct packed_git *p;
-	int ret = -1;

 	assert(!bitmap_git->map);

 	for (p = get_all_packs(r); p; p = p->next) {
-		if (open_pack_bitmap_1(bitmap_git, p) == 0)
-			ret = 0;
+		if (!open_pack_bitmap_1(bitmap_git, p))
+			return 0;
 	}

-	return ret;
+	return -1;
 }

 static int open_midx_bitmap(struct repository *r,

...but agree that we should wait until after the dust has settled on
this already-complex series.

> >   - We should always be operating on the repository's
> >     r->objects->multi_pack_index, or any other MIDX that can be reached
> >     via walking the `->next` pointers. If we do that consistently, then
> >     we'll only have at most one instance of a MIDX struct corresponding
> >     to each MIDX file on disk.
>
> Certainly that makes sense to me in terms of the Windows "must close the
> current midx before writing" behavior. We have to realize that we're
> operating in the current repo.
>
> But we do allow an "--object-dir" option to "multi-pack-index write",
> and I don't see any other code explicitly requiring that it be part of
> the current repository. What I'm wondering is whether this would be
> breaking:
>
>   cd $REPO/..
>   git multi-pack-index --object-dir $REPO/.git/objects write
>
> or:
>
>   cd /some/other/repo
>   git multi-pack-index --object-dir $REPO/.git/objects write
>
> The latter does seem to work, but the former segfaults (usually -- if
> there's already a midx it is OK).

The former should work, but doesn't, because (as you pointed out to me
in our regular weekly discussion off-list) that the "multi-pack-index"
entry in git.c's commands array has the RUN_SETUP_GENTLY option, and
probably should have RUN_SETUP so that we complain with die() instead of
BUG.

And the latter will continue to work, but only if in your scenario that
$REPO is an alternate of /some/other/repo.

I wrote a little bit more in [1] about this behavior, but the upshot is
that we used to technically support passing *any* directory to
`--object-dir`, including directories that didn't belong to an
alternated repository.

And that will cease to work after the patch that [1] is in response to
is applied. But for the reasons that I explain there, I think that is a
sufficient outcome, because the behavior is kind of bizarre to begin
with.

[1]: https://lore.kernel.org/git/YQMB32fvSiH9julg@nand.local/

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux