Re: [PATCH] repack: only repack .packs that exist

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 20, 2023 at 07:03:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@xxxxxxxxxx>
>
> In 73320e49add (builtin/repack.c: only collect fully-formed packs,
> 2023-06-07), we switched the check for which packs to collect by
> starting at the .idx files and looking for matching .pack files. This
> avoids trying to repack pack-files that have not had their pack-indexes
> installed yet.

Right; the motivation behind this change is based on the principle that
we consider a packfile (and any corresponding metadata it might have,
like .keep, .bitmap, .rev, etc.) to be accessible exactly when there is
an .idx file in place.

So if pack-P.idx exists, we expect to be able to read the contents of
pack-P.pack (and any other pack-P.* files that happen to be laying
around). The idea with 73320e49add is to make `collect_pack_filenames()`
consistent with the same pack existence check that we use everywhere
else.

> However, it does cause maintenance to halt if we find the (problematic,
> but not insurmountable) case of a .idx file without a corresponding
> .pack file. In an environment where packfile maintenance is a critical
> function, such a hard stop is costly and requires human intervention to
> resolve (by deleting the .idx file).

Do all cases fall into the "problematic, but not insurmountable"
category? I'm not entirely sure. One instance that is definitely OK is
having a partially unstaged pack whose objects were repacked elsewhere,
but the 'repack' process doing so died between unlink()-ing the ".pack"
file and the ".idx". This matches the patch that you and I discussed
yesterday to switch the order and restore git-repack.sh's original
behavior, which is correct.

But what about someone accidentally deleting a ".pack" file and leaving
around its index? Or copying files from one repository to another and
only partially copying the contents of $GIT_DIR/objects/pack?

What makes me leery about this kind of change is that there are
potentially many more cases like the above. And even if the interesting
cases are somehow limited to the above, continuing on in the face of a
malformed pack group doesn't seem like the right thing for repack to be
doing.

> This was not the case before. We successfully repacked through this
> scenario until the recent change to scan for .idx files.

Right, my feeling is that the new behavior post-73320e49add is arguably
more correct.

> In other cases, Git prepares its list of packfiles by scanning .idx
> files and then only adds it to the packfile list if the corresponding
> .pack file exists. It even does so without a warning! (See
> add_packed_git() in packfile.c for details.)

This is by design: we only try to open the pack itself when we call
`open_packed_git()`. All packs start out initialized with just their
".idx" as you note.

This feels similar to what you are trying to do here, but I do think
that it's different. While we're performing object reads, there may be
concurrent writers adding (e.g. from pushes) or removing packs (e.g.
from repacking) in the repository. So there we have to be resilient
to thinking that we might have a copy of some object in a given pack,
but still being able to handle the case where that pack disappears and
the object is actually found in some other pack.

Repacking feels different to me. There we are modifying the packs
themselves, so having a incorrectly staged pack group feels like a
condition that we should stop on, because we can no longer completely
trust the data that we are operating on.

> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 08b5ba0c150..e780efe5e0c 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -239,6 +239,10 @@ test_expect_success 'repack --keep-pack' '
>  			mv "$from" "$to" || return 1
>  		done &&
>
> +		# A .idx file without a .pack should not stop us from
> +		# repacking what we can.
> +		touch .git/objects/pack/pack-does-not-exist.idx &&
> +
>  		git repack --cruft -d --keep-pack $P1 --keep-pack $P4 &&
>
>  		ls .git/objects/pack/*.pack >newer-counts &&

If we do end up pursuing this patch, it may be worthwhile to ensure that
the dangling .idx file remains in place after the repack is complete.
You test for this implicitly below, but doing so here explicitly feels
worthwhile to me.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux