Re: [PATCH 2/8] builtin/repack.c: extract marking packs for deletion

Jeff King <peff@xxxxxxxx> · Thu, 7 Sep 2023 03:59:31 -0400

On Tue, Sep 05, 2023 at 04:36:43PM -0400, Taylor Blau wrote:

> At the end of a repack (when given `-d`), Git attempts to remove any
> packs which have been made "redundant" as a result of the repacking
> operation. For example, an all-into-one (`-A` or `-a`) repack makes
> every pre-existing pack which is not marked as kept redundant. Geometric
> repacks (with `--geometric=<n>`) make any packs which were rolled up
> redundant, and so on.
> 
> But before deleting the set of packs we think are redundant, we first
> check to see whether or not we just wrote a pack which is identical to
> any one of the packs we were going to delete. When this is the case, Git
> must avoid deleting that pack, since it matches a pack we just wrote
> (so deleting it may cause the repository to become corrupt).
> 
> Right now we only process the list of non-kept packs in a single pass.
> But a future change will split the existing non-kept packs further into
> two lists: one for cruft packs, and another for non-cruft packs.
> 
> Factor out this routine to prepare for calling it twice on two separate
> lists in a future patch.

There are really two "factor outs" here: we pull the code from
cmd_repack() into a helper, and then the helper is also just a thin
wrapper around its "_1" variant. That latter part isn't needed yet, but
I can guess from your description that we'll eventually have the main
function dispatch to the "_1" helper for lists.

The main caller in cmd_repack() could do that double-dispatch, but then
this code:

> +	if (delete_redundant && pack_everything & ALL_INTO_ONE)
> +		mark_packs_for_deletion(&existing, &names);

would know more about how "existing_packs" work than it needs to. So
this seems like a good split (and the two-liner above is making
cmd_repack() much more readable than the big loop it had in the
pre-image).

-Peff