Re: [PATCH v3 08/16] midx: allow marking a pack as preferred

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 11, 2021 at 12:05:07PM -0500, Taylor Blau wrote:

> To encourage the pack selection process to prefer one pack over another
> (the pack to be preferred is the one a caller would like to later use as
> a reuse pack), introduce the concept of a "preferred pack". When
> provided, the MIDX code will always prefer an object found in a
> preferred pack over any other.
> 
> No format changes are required to store the preferred pack, since it
> will be able to be inferred with a corresponding MIDX bitmap, by looking
> up the pack associated with the object in the first bit position (this
> ordering is described in detail in a subsequent commit).

I think in the long run we may want to add a midx chunk that gives the
order of the packs (and likewise allow the caller of "midx write" to
specify the exact order), since that may allow correlating locality
between history and object order within the .rev/.bitmap files.

But I think this is a nice stopping point for this series, since we're
not having to introduce any new on-disk formats to do it, and it seems
to give pretty good results in practice. I guess we'll have to support
--preferred-pack forever, but that's OK. Even if we do eventually
support arbitrary orderings, it's just a simple subset of that
functionality.

>  static int cmd_multi_pack_index_write(int argc, const char **argv)
>  {
> -	struct option *options = common_opts;
> +	struct option *options;
> +	static struct option builtin_multi_pack_index_write_options[] = {
> +		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
> +			   N_("preferred-pack"),
> +			   N_("pack for reuse when computing a multi-pack bitmap")),
> +		OPT_END(),
> +	};
> +
> +	options = parse_options_dup(builtin_multi_pack_index_write_options);
> +	options = add_common_options(options);
>  
>  	trace2_cmd_mode(argv[0]);
>  
> @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  		usage_with_options(builtin_multi_pack_index_write_usage,
>  				   options);
>  
> -	return write_midx_file(opts.object_dir, opts.flags);
> +	return write_midx_file(opts.object_dir, opts.preferred_pack,
> +			       opts.flags);
>  }

This has the same leak of "options" that I mentioned in the earlier
patch.

> diff --git a/midx.c b/midx.c
> index 971faa8cfc..46f55ff6cf 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
>  	return strcmp(a->pack_name, b->pack_name);
>  }
>  
> +static int lookup_idx_or_pack_name(struct pack_info *info,
> +				   uint32_t nr,
> +				   const char *pack_name)
> +{
> +	uint32_t lo = 0, hi = nr;
> +	while (lo < hi) {
> +		uint32_t mi = lo + (hi - lo) / 2;
> +		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
> +		if (cmp < 0)
> +			hi = mi;
> +		else if (cmp > 0)
> +			lo = mi + 1;
> +		else
> +			return mi;
> +	}
> +	return -1;
> +}

Could this just be replaced with bsearch() in the caller?

> +test_expect_success 'midx picks objects from preferred pack' '
> +	test_when_finished rm -rf preferred.git &&
> +	git init --bare preferred.git &&
> +	(
> +		cd preferred.git &&
> +
> +		a=$(echo "a" | git hash-object -w --stdin) &&
> +		b=$(echo "b" | git hash-object -w --stdin) &&
> +		c=$(echo "c" | git hash-object -w --stdin) &&
> +
> +		# Set up two packs, duplicating the object "B" at different
> +		# offsets.
> +		git pack-objects objects/pack/test-AB <<-EOF &&
> +		$a
> +		$b
> +		EOF
> +		bc=$(git pack-objects objects/pack/test-BC <<-EOF
> +		$b
> +		$c
> +		EOF
> +		) &&

I don't think pack-objects guarantees that the pack ordering matches the
input it received. compute_write_order() uses a variety of heuristics to
reorder things. I think this will work in practice with the current
code, because the objects have the same type, there are no deltas, and
the fallback ordering is input-order (or traversal order, if --revs is
used).

So it's probably OK in practice, though if we wanted to be paranoid we
could check that show-index produces different results for the $b entry
of both packs. That said...

> +		git multi-pack-index --object-dir=objects \
> +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> +		test_must_be_empty err &&
> +
> +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> +			cut -d" " -f1) &&
> +		midx_expect_object_offset $b $ofs objects
> +	)

...what we really care about is that the object came from BC. And we are
just using the offset as a proxy for that. But doesn't "test-tool
read-midx" give us the actual pack name? We could just be checking that.

I also wondered if we should confirm that without the --preferred-pack
option, we choose the other pack. I think it will always be true because
the default order is to sort them lexically. A comment to that effect
might be worth it (near the "set up two packs" comment).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux