Re: [PATCH 08/17] builtin/pack-objects.c: --cruft without expiration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/29/2021 5:25 PM, Taylor Blau wrote:
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> +static int add_cruft_object_entry(const struct object_id *oid, enum object_type type,
> +				  struct packed_git *pack, off_t offset,
> +				  const char *name, uint32_t mtime)
> +{
> +	struct object_entry *entry;
> +
> +	display_progress(progress_state, ++nr_seen);

I don't love the global nr_seen here, but it is pervasive through the
file. OK.

> +	entry = packlist_find(&to_pack, oid);
> +	if (entry) {
> +		if (name) {
> +			entry->hash = pack_name_hash(name);
> +			entry->no_try_delta = name && no_try_delta(name);

This is already in an "if (name)" block, so "name &&" isn't needed.

> +		}
> +	} else {
> +		if (!want_object_in_pack(oid, 0, &pack, &offset))
> +			return 0;
> +		if (!pack && type == OBJ_BLOB && !has_loose_object(oid)) {
> +			/*
> +			 * If a traversed tree has a missing blob then we want
> +			 * to avoid adding that missing object to our pack.
> +			 *
> +			 * This only applies to missing blobs, not trees,
> +			 * because the traversal needs to parse sub-trees but
> +			 * not blobs.
> +			 *
> +			 * Note we only perform this check when we couldn't
> +			 * already find the object in a pack, so we're really
> +			 * limited to "ensure non-tip blobs which don't exist in
> +			 * packs do exist via loose objects". Confused?
> +			 */
> +			return 0;
> +		}
> +
> +		entry = create_object_entry(oid, type, pack_name_hash(name),
> +					    0, name && no_try_delta(name),
> +					    pack, offset);
> +	}
> +
> +	if (mtime > oe_cruft_mtime(&to_pack, entry))
> +		oe_set_cruft_mtime(&to_pack, entry, mtime);
> +	return 1;

I was confused at this "return 1" here, while other cases return 0.

It turns out that there are multiple methods in this file that have
different semantics: add_loose_object() and add_object_entry_from_pack()
are both called from iterators where "return 1" means "stop iterating"
so they return 0 always. add_object_entry_from_bitmap() is used to
iterate over a bitmap and "return 1" means "include this object".

However, the return code for add_cruft_object_entry() is never used,
so it should probably return void or swap the meanings to have nonzero
mean an error occurred.

> +static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
> +{
> +	struct string_list_item *item = NULL;
> +	for_each_string_list_item(item, packs) {
> +		struct packed_git *p = item->util;
> +		if (!p)
> +			die(_("could not find pack '%s'"), item->string);

Interesting that this is a potential issue. We are expecting the pack
to be loaded before we get here. Is this more because some packs might
not actually load, but it's fine as long as we don't mark them as kept?

> +		p->pack_keep_in_core = keep;
> +	}
> +}
...
> +static void read_cruft_objects(void)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list discard_packs = STRING_LIST_INIT_DUP;
> +	struct string_list fresh_packs = STRING_LIST_INIT_DUP;
> +	struct packed_git *p;
> +
> +	ignore_packed_keep_in_core = 1;

Here is a global that we are suddenly changing. Should we not be
returning it to its initial state when this method is complete?

> +static int option_parse_cruft_expiration(const struct option *opt,
> +					 const char *arg, int unset)
> +{
> +	if (unset) {
> +		cruft = 0;

This unassignment of 'cruft' when cruft-expiration is unset with
--no-cruft-expiration seems odd. I would expect

	git pack-objects --cruft --no-cruft-expiration

to still make a cruft pack, but not expire anything. It seems that
your code here makes --no-cruft-expiration disable the --cruft option.

> +		cruft_expiration = 0;
> +	} else {
> +		cruft = 1;
> +		if (arg)
> +			cruft_expiration = approxidate(arg);
> +	}
> +	return 0;
> +}
..
> +		OPT_BOOL(0, "cruft", &cruft, N_("create a cruft pack")),
> +		OPT_CALLBACK_F(0, "cruft-expiration", NULL, N_("time"),
> +		  N_("expire cruft objects older than <time>"),
> +		  PARSE_OPT_OPTARG, option_parse_cruft_expiration),

> -static int has_loose_object(const struct object_id *oid)
> +int has_loose_object(const struct object_id *oid)
>  {
>  	return check_and_freshen(oid, 0);
>  }

I'm surprised this hasn't been modified to use a repository pointer.
Adding another caller here isn't too much debt, though.

> diff --git a/object-store.h b/object-store.h
> index d87481f101..a79c1c91ab 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -308,6 +308,8 @@ int repo_has_object_file_with_flags(struct repository *r,
>   */
>  int has_loose_object_nonlocal(const struct object_id *);

Of course, here is another example that is already more widely used.

> +int has_loose_object(const struct object_id *);
> +
>  void assert_oid_type(const struct object_id *oid, enum object_type expect);

...

> +	test_expect_success "unreachable packed objects are packed (expire $expire)" '
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +
> +			test_commit packed &&
> +			git repack -Ad &&
> +			test_commit other &&
> +
> +			git rev-list --objects --no-object-names packed.. >objects &&
> +			keep="$(basename "$(ls $packdir/pack-*.pack)")" &&
> +			other="$(git pack-objects --delta-base-offset \
> +				$packdir/pack <objects)" &&
> +			git prune-packed &&
> +
> +			test-tool chmtime --get -100 "$packdir/pack-$other.pack" >expect &&

I am missing how this test creates _unreachable_ objects. I would expect removal of
some refs or a 'git reset --hard' somewhere. What am I missing?

> +			cruft="$(git pack-objects --cruft --cruft-expiration="$expire" $packdir/pack <<-EOF
> +			$keep
> +			-pack-$other.pack
> +			EOF
> +			)" &&
> +			test-tool pack-mtimes "pack-$cruft.mtimes" >actual.raw &&
> +
> +			cut -d" " -f2 <actual.raw | sort -u >actual &&
> +
> +			test_cmp expect actual
> +		)
> +	'
> +
> +	test_expect_success "unreachable cruft objects are repacked (expire $expire)" '

I have the same question for all of the tests, really.

> +			# remove the unreachable tree, but leave the commit
> +			# which has it as its root tree in-tact

nit: "intact" is one word.

> +			rm -fr "$objdir/$(test_oid_to_path "$tree")" &&
> +
> +			git repack -Ad &&
> +			basename $(ls $packdir/pack-*.pack) >in &&
> +			git pack-objects --cruft --cruft-expiration="$expire" \
> +				$packdir/pack <in
> +		)
> +	'

...

> +basic_cruft_pack_tests never

I look forward to seeing how this changes with additional expiration values.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux