Re: [PATCH 2/2] builtin/repack: fix `--keep-unreachable` when there are no packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 03, 2025 at 02:06:55PM +0100, Patrick Steinhardt wrote:

> The "--keep-unreachable" flag is supposed to append any unreachable
> objects to the newly written pack. This flag is explicitly documented as
> appending both packed and loose unreachable objects to the new packfile.
> And while this works alright when repacking with preexisting packfiles,
> it stops working when the repository does not have any packfiles at all.

OK. I thought from the subject / cover letter this was going to be about
the fact that "git repack -adk" may sometimes say "Nothing new to pack".
And the issue there is that if there are no reachable objects, we don't
actually pack at al.

But this is a separate issue, where we actually do repack, but don't
correctly pass the options. Let's read on...

> The root cause are the conditions used to decide whether or not we want
> to append "--pack-loose-unreachable" to git-pack-objects(1). There are
> a couple of conditions here:
> 
>   - `has_existing_non_kept_packs()` checks whether there are existing
>     packfiles. This condition makes sense to guard "--keep-pack=",
>     "--unpack-unreachable" and "--keep-unreachable", because all of
>     these flags only make sense in combination with existing packfiles.
>     But it does not make sense to disable `--pack-loose-unreachable`
>     when there aren't any preexisting packfiles, as loose objects can be
>     packed into the new packfile regardless of that.

Yeah, this analysis makes sense, and is the root of the problem.

>   - `delete_redundant` checks whether we want to delete any objects or
>     packs that are about to become redundant. The documentation of
>     `--keep-unreachable` explicitly says that `git repack -ad` needs to
>     be executed for the flag to have an effect.
> 
>     It is not immediately obvious why such redundant objects need to be
>     deleted in order for "--pack-unreachable-objects" to be effective.
>     But as things are working as documented this is nothing we'll change
>     for now.

I don't think it's strictly necessary to require "-d" here. The original
concept of "-k" was to modify "-d" to keep objects instead of loosening
(so an alternative to --unpack-unreachable/-A). And then it expanded to
collecting the loose objects, too, for the reasons given in e26a8c4721
(repack: extend --keep-unreachable to loose objects, 2016-06-13).

I think you could conceive of "-k" as an alternative to "-d", rather
than an alternative to "-A". I.e., so that "repack -ak" did the same as
"repack -adk" does now.

And it would probably not even be a big code change, but it's possible
there would be some unexpected fallout (the logic in repack is quite
tortured and intricate from my recollection).

So I don't know that it's really worth it to change now. Especially
because I think "-k" has mostly outlived its usefulness. Cruft packs
solve the same problem but keep the extra objects in their own pack,
where they're less likely to interfere with normal operations. I'd
recommend anybody considering "-k" now to look into cruft packs instead.

>   - `pack_everything & PACK_CRUFT` checks that we're not creating a
>     cruft pack. This condition makes sense in the context of
>     "--pack-loose-unreachable", as unreachable objects would end up in
>     the cruft pack anyway.
> 
> So while the second and third condition are sensible, it does not make
> any sense to condition `--pack-loose-unreachable` on the existence of
> packfiles.

Yup, agreed.

> Fix the bug by splitting out the "--pack-loose-unreachable" and only
> making it depend on the second and third condition. Like this, loose
> unreachable objects will be packed regardless of any preexisting
> packfiles.

Makes sense. My only question would be whether there are any gotchas
inside pack-objects about using --pack-loose-unreachable without
--keep-unreachable (since the two were up until now always used
together).

It was added by e26a8c4721. And looking over that patch, I don't see
anything that would let the options be used independently. So this seems
like a good solution.

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 81d13630ea..8194344b04 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -1370,9 +1370,12 @@ int cmd_repack(int argc,
>  					    "--unpack-unreachable");
>  			} else if (keep_unreachable) {
>  				strvec_push(&cmd.args, "--keep-unreachable");
> -				strvec_push(&cmd.args, "--pack-loose-unreachable");
>  			}
>  		}
> +
> +		if (keep_unreachable && delete_redundant &&
> +		    !(pack_everything & PACK_CRUFT))
> +			strvec_push(&cmd.args, "--pack-loose-unreachable");

One funny thing here is that previously unpack_unreachable took
precedence over keep_unreachable in the if-else chain. I wondered if we
could end up invoking pack-objects with both --unpack-unreachable and
--pack-loose-unreachable, which is nonsense.

But I think the answer is no, because we forbid --unpack-unreachable/-A
and --keep-unreachable from both being passed up front.

> -test_expect_failure '--keep-unreachable packs unreachable loose object without existing packs' '
> +test_expect_success '--keep-unreachable packs unreachable loose object without existing packs' '
>  	test_when_finished "rm -rf repo" &&
>  	git init repo &&
>  	(

Your test from patch 1 looked reasonable to me. If you fold patch 1 into
the existing tests in t7701, you might want to adjust it to match the
techniques those tests use for checking the object (rather than the new
helpers you added).

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux