Re: [PATCH] repack: respect --keep-pack with geometric repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau wrote:
> Hi Victoria,
> 
> On Fri, May 20, 2022 at 04:36:12PM +0000, Victoria Dye via GitGitGadget wrote:
>> From: Victoria Dye <vdye@xxxxxxxxxx>
>>
>> Update 'repack' to ignore packs named on the command line with the
>> '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat
>> command line-kept packs the same way it treats packs with an on-disk '.keep'
>> file (that is, skip the pack and do not include it in the 'geometry'
>> structure).
>>
>> Without this handling, a '--keep-pack' pack would be included in the
>> 'geometry' structure. If the pack is *before* the geometry split line (with
>> at least one other pack and/or loose objects present), 'repack' assumes the
>> pack's contents are "rolled up" into another pack via 'pack-objects'.
>> However, because the internally-invoked 'pack-objects' properly excludes
>> '--keep-pack' objects, any new pack it creates will not contain the kept
>> objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since
>> it assumes 'pack-objects' created a new pack with its contents), resulting
>> in possible object loss and repository corruption.
> 
> Nicely found and explained. Having discussed this fix with you already
> off-list, this approach (to treat kept packs as excluded from the list
> of packs in the `geometry` structure regardless of whether they are kept
> on disk or in-core) makes sense to me.
> 
> I left a couple of small notes on the patch below, but since I have some
> patches that deal with a separate issue in the `git repack --geometric`
> code coming, do you want to combine forces (and I can send a
> lightly-reworked version of this patch as a part of my series)?
> 

Works for me! I'm happy with all the suggested changes you noted below
(moving the 'string_list_sort' and cleaning up the test), so feel free to
include them in your series.

Thanks!

>> @@ -332,17 +332,34 @@ static int geometry_cmp(const void *va, const void *vb)
>>  	return 0;
>>  }
>>
>> -static void init_pack_geometry(struct pack_geometry **geometry_p)
>> +static void init_pack_geometry(struct pack_geometry **geometry_p,
>> +			       struct string_list *existing_kept_packs)
>>  {
>>  	struct packed_git *p;
>>  	struct pack_geometry *geometry;
>> +	struct strbuf buf = STRBUF_INIT;
>>
>>  	*geometry_p = xcalloc(1, sizeof(struct pack_geometry));
>>  	geometry = *geometry_p;
>>
>> +	string_list_sort(existing_kept_packs);
> 
> Would it be worth sorting this as early as in collect_pack_filenames()?
> For our purposes in this patch, this works as-is, but it may be
> defensive to try and minimize the time that list has unsorted contents.
> 

I went back and forth on this, eventually settling on this to keep the
'string_list_sort' as close as possible to where the sorted list is needed.
I'm still pretty indifferent, though, so moving it to the end of
'collect_pack_filenames()' is fine with me.

>>  	for (p = get_all_packs(the_repository); p; p = p->next) {
>> -		if (!pack_kept_objects && p->pack_keep)
>> -			continue;
>> +		if (!pack_kept_objects) {
>> +			if (p->pack_keep)
>> +				continue;
> 
> (You mentioned this to me off-list, but I'll repeat it here since it
> wasn't obvious to me on first read): this check for `p->pack_keep` isn't
> strictly necessary, since any packs that have their `pack_keep` bit set
> will appear in the `existing_kept_packs` list.
> 
> But it does give us a fast path to avoid having to check that list, so
> it's worth checking that bit to avoid a slightly more expensive check
> where possible.
> 
>> +			/*
>> +			 * The pack may be kept via the --keep-pack option;
>> +			 * check 'existing_kept_packs' to determine whether to
>> +			 * ignore it.
>> +			 */
>> +			strbuf_reset(&buf);
>> +			strbuf_addstr(&buf, pack_basename(p));
>> +			strbuf_strip_suffix(&buf, ".pack");
>> +
>> +			if (string_list_has_string(existing_kept_packs, buf.buf))
>> +				continue;
> 
> It's too bad that we have to do this check at all, and can't rely on the
> `pack_keep_in_core` in the same way as we check `p->pack_keep`. But
> lifting that restriction is a more invasive change, so I'm happy to
> rely on the contents of existing_kept_packs here in the meantime.
> 
>> +		}
>>
>>  		ALLOC_GROW(geometry->pack,
>>  			   geometry->pack_nr + 1,
>> @@ -353,6 +370,7 @@ static void init_pack_geometry(struct pack_geometry **geometry_p)
>>  	}
>>
>>  	QSORT(geometry->pack, geometry->pack_nr, geometry_cmp);
>> +	strbuf_release(&buf);
>>  }
>>
>>  static void split_pack_geometry(struct pack_geometry *geometry, int factor)
>> @@ -714,17 +732,20 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>  		strbuf_release(&path);
>>  	}
>>
>> +	packdir = mkpathdup("%s/pack", get_object_directory());
>> +	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
>> +	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
>> +
>> +	collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
>> +			       &keep_pack_list);
>> +
> 
> Makes sense; we have to initialize existing_kept_packs before arranging
> the list of packs for the `--geometric` split. And presumably
> `collect_pack_filenames()` relies on `packdir`, `packtmp_name`, and
> `packtmp` being setup ahead of time, too.
> 
>>  	if (geometric_factor) {
>>  		if (pack_everything)
>>  			die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
>> -		init_pack_geometry(&geometry);
>> +		init_pack_geometry(&geometry, &existing_kept_packs);
>>  		split_pack_geometry(geometry, geometric_factor);
>>  	}
>>
>> -	packdir = mkpathdup("%s/pack", get_object_directory());
>> -	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
>> -	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
>> -
>>  	sigchain_push_common(remove_pack_on_signal);
>>
>>  	prepare_pack_objects(&cmd, &po_args);
>> @@ -764,9 +785,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>  	if (use_delta_islands)
>>  		strvec_push(&cmd.args, "--delta-islands");
>>
>> -	collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
>> -			       &keep_pack_list);
>> -
>>  	if (pack_everything & ALL_INTO_ONE) {
>>  		repack_promisor_objects(&po_args, &names);
>>
>> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
>> index bdbbcbf1eca..f5ac23413d5 100755
>> --- a/t/t7703-repack-geometric.sh
>> +++ b/t/t7703-repack-geometric.sh
>> @@ -180,6 +180,40 @@ test_expect_success '--geometric ignores kept packs' '
>>  	)
>>  '
>>
>> +test_expect_success '--geometric ignores --keep-pack packs' '
>> +	git init geometric &&
>> +	test_when_finished "rm -fr geometric" &&
>> +	(
>> +		cd geometric &&
>> +
>> +		# Create two equal-sized packs
>> +		test_commit kept && # 3 objects
>> +		test_commit pack && # 3 objects
>> +
>> +		KEPT=$(git pack-objects --revs $objdir/pack/pack <<-EOF
>> +		refs/tags/kept
>> +		EOF
>> +		) &&
>> +		PACK=$(git pack-objects --revs $objdir/pack/pack <<-EOF
>> +		refs/tags/pack
>> +		^refs/tags/kept
>> +		EOF
>> +		) &&
> 
> Nit; we don't care about the name of $PACK, so it would probably be fine
> to avoid storing the `PACK` variable. We could write these packs with
> just `git repack -d` after each `test_commit` (which would avoid us
> having to call `prune-packed`).
> 

Makes sense.

> Does it matter which one is kept? I don't think so, since AFAICT the
> critical bit is that we mark one of the packs being rolled up as a
> `--keep-pack`.
> 

Correct, the two packs in this test are just two same-sized (or, more
generally, non-geometrically progressing) packs with non-overlapping
content.

>> +		# Prune loose objects that are now packed into PACK and KEEP
>> +		git prune-packed &&
>> +
>> +		git repack --geometric 2 -dm --keep-pack=pack-$KEPT.pack >out &&
>> +
>> +		# Packs should not have changed (only one non-kept pack, no
>> +		# loose objects), but midx should now exist.
>> +		test_i18ngrep "Nothing new to pack" out &&
> 
> Nit; test_i18ngrep here should just be "grep".
> 

Thanks for pointing this out - I've been a bit unsure of the difference for
a while, but this pushed me to figure out the difference and I found the
note in 'test-lib-functions.sh' clarifying that 'test_i18ngrep' is
deprecated.

>> +		test_path_is_file $midx &&
>> +		test_path_is_file $objdir/pack/pack-$KEPT.pack &&
>> +		git fsck
>> +	)
> 
> 
> Thanks,
> Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux