Re: [PATCH] repack: fix geometric repacking with gitalternates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 10, 2023 at 11:06:49AM -0400, Derrick Stolee wrote:
> Your original message includes error messages like "could not find pack"
> and "unknown preferred pack" which makes me think the _real_ problem is
> that we are not respecting the full path name of the pack-file and are
> somehow localizing packs to the local object dir.

"could not find pack" comes from pack-objects's --stdin-packs mode when
the caller specifies a pack that doesn't exist in the repository.

I'm not sure what's going on here, since read_packs_list_from_stdin()
searches over the result of calling get_all_packs(), which does include
packs in alternate object stores. And indeed, pack-objects can recognize
such packs:

--- 8< ---
#!/bin/sh

rm -fr shared member

git init shared
git -C shared commit --allow-empty -m "base" && git -C shared repack -d

git clone --shared shared member

basename "$(find shared/.git/objects/pack -type f -name '*.pack')" >input

git -C member pack-objects .git/objects/pack/pack --stdin-packs <input

for repo in shared member
do
	echo "==> $repo"
	find $repo/.git/objects/pack -type f
done
--- >8 ---

^^ the above results in generating the identical pack in the "member"
repository as expected:

==> shared
shared/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.idx
shared/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.pack
==> member
member/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.idx
member/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.pack

> The basic reason is that write_midx_included_packs() takes all of the
> pack-files from the geometry, but does not strip out the pack-files
> that are not in the same object directory.

I agree here; the pack-geometry code should be stripping out non-local
packs when writing a MIDX regardless of whether the caller passed
--local or not.

> Perhaps this method could include a step to create a new, "local"
> geometry containing only the packs within the local object dir. We can
> then skip the --preferred-pack option and bitmap if this is different
> than the original geometry (perhaps with a warning message to help
> users who did this accidentally).

It would be nice to not have to juggle multiple pack geometry structs,
since the logic of what gets repacked, what gets thrown away, and what
gets kept is already fairly complicated (at least to me) and pretty
fragile.

I think if I were sketching this out, I'd start out by doing something
like:

--- 8< ---
diff --git a/builtin/repack.c b/builtin/repack.c
index df4d8e0f0b..eab5f58444 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -558,6 +558,10 @@ static void midx_included_packs(struct string_list *include,
 		for (i = geometry->split; i < geometry->pack_nr; i++) {
 			struct packed_git *p = geometry->pack[i];

+			/* MIDXs cannot refer to non-local packs */
+			if (!p->pack_local)
+				continue;
+
 			strbuf_addstr(&buf, pack_basename(p));
 			strbuf_strip_suffix(&buf, ".pack");
 			strbuf_addstr(&buf, ".idx");
--- >8 ---

...and I actually think that might do it, since:

	- existing_nonkept_packs is populated by calling readdir() on the local
		repository's $GIT_DIR/objects/pack directory, so it will never contain
		any non-local packs.

	- existing_kept_packs is also OK for the same reason

	- names (which tracks the packs that we just wrote) will never contain a
		non-local pack, since we never write packs outside of our local pack
		directory

So that would cause you to write a MIDX containing only local packs (as
desired) regardless of whether or not the caller passed --[no]-local or
not.

> > I'd personally be fine to start honoring the `po_args.local` flag so
> > that we skip over any non-local packfiles there while ignoring the
> > larger problem of non-local geometric repacks breaking in certain
> > scenarios. It would at least improve the status quo as users now have a
> > way out in case they ever happen to hit that error. And it allows us to
> > use geometric repacks in alternated repositories. But are we okay with
> > punting on the larger issue for the time being?
>
> I think the real bug is isolated in write_midx_included_packs() in how
> it may specify packs that the multi-pack-index cannot track. It should
> be worth the time exploring if there is an easy fix there, and then the
> po_args.local version can be used as a backup/performance tweak on top
> of that.

Yeah, seeing "unknown preferred pack" makes me suspicious that we tried
to write a MIDX that contains a pack outside of our repository. I tried
to reproduce that case with the following script but couldn't do it (the
pack that the MIDX does track is the local one, as expected):

--- 8< ---
#!/bin/sh

rm -fr shared member

git init shared
git -C shared commit --allow-empty -m "base" && git -C shared repack -d

git clone --shared shared member

git -C member commit --allow-empty -m "base" && git -C member repack -d

{
	basename "$(find shared/.git/objects/pack -type f -name '*.idx')"
	basename "$(find member/.git/objects/pack -type f -name '*.idx')"
} |
git.compile -C member multi-pack-index write --stdin-packs

for repo in shared member
do
	echo "==> $repo"
	find $repo/.git/objects/pack -type f
done

( cd member && ~/src/git/t/helper/test-tool read-midx --show-objects .git/objects )
--- >8 ---

But the MIDX code that is responsible for collecting the packs specified
over --stdin-packs enumerates the possible packs by calling
`for_each_file_in_pack_dir()` with our local `object_dir`, so I don't
think it's possible to have a MIDX that contains a non-local pack.

IOW, I agree with you that the bug is in write_midx_included_packs(),
and I think the fix that I outlined would do the trick.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux