Re: [PATCH 1/2] pack-objects: Teach --use-bitmap-index codepath to respect --local, --honor-pack-keep and --incremental

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 08, 2016 at 03:37:35PM +0300, Kirill Smelkov wrote:

> @@ -958,15 +958,30 @@ static int want_object_in_pack(const unsigned char *sha1,
>  			       off_t *found_offset)
>  {
>  	struct packed_git *p;
> +	struct packed_git *pack1 = *found_pack;
> +	int pack1_seen = !pack1;
>  
>  	if (!exclude && local && has_loose_object_nonlocal(sha1))
>  		return 0;
>  
> -	*found_pack = NULL;
> -	*found_offset = 0;
> +	/*
> +	 * If we already know the pack object lives in, start checks from that
> +	 * pack - in the usual case when neither --local was given nor .keep files
> +	 * are present the loop will degenerate to have only 1 iteration.
> +	 */
> +	for (p = (pack1 ? pack1 : packed_git); p;
> +	     p = (pack1_seen ? p->next : packed_git), pack1_seen = 1) {
> +		off_t offset;

Hmm. So this is basically sticking the found-pack at the front of the
loop.

We either need to look at zero packs here (we already know where the
object is, and we don't need to bother with --local or .keep lookups),
or we need to look at all of them (to check for local/keep).

I guess you structured it this way to try to reuse the "can we break out
early" logic from the middle of the loop. So we go through the loop one
time, and then break out. And then this:

> +		if (p == pack1) {
> +			if (pack1_seen)
> +				continue;
> +			offset = *found_offset;
> +		}
> +		else {
> +			offset = find_pack_entry_one(sha1, p);
> +		}

is meant to make that one-time through the loop cheaper. So I don't
think it's wrong, but it's very confusing to me.

Would it be simpler to stick that logic in a function like:

  static int want_found_object(int exclude, struct packed_git *pack)
  {
	if (exclude)
		return 1;
	if (incremental)
		return 0;

	/* if we can break early, then do so */
	if (!ignore_packed_keep &&
	    (!local || !have_non_local_packs))
		return 1;

	if (local && !p->pack_local)
		return 0;
	if (ignore_packed_keep && p->pack_local && p->pack_keep)
		return 0;

	/* indeterminate; keep looking for more packs */
	return -1;
  }

  static int want_object_in_pack(...)
  {
	...
	if (!exclude && local && has_loose_object_nonlocal(sha1))
		return 0;

	if (*found_pack) {
		int ret = want_found_object(exclude, *found_pack);
		if (ret != -1)
			return ret;
	}

	for (p = packed_git; p; p = p->next) {
		off_t offset;

		if (p == *found_pack)
			offset = *found_offset;
		else
			offset = find_pack_entry(sha1, p);
		if (offset) {
			... fill in *found_pack ...
			int ret = want_found_object(exclude, p);
			if (ret != -1)
				return ret;
		}
	}
	return 1;
  }

That's a little more verbose, but IMHO the flow is a lot easier to
follow (especially as the later re-rolls of that series actually muck
with the loop order more, but with this approach there's no conflict).

>  static int add_object_entry(const unsigned char *sha1, enum object_type type,
>  			    const char *name, int exclude)
>  {
> -	struct packed_git *found_pack;
> -	off_t found_offset;
> +	struct packed_git *found_pack = NULL;
> +	off_t found_offset = 0;
>  	uint32_t index_pos;
>  
>  	if (have_duplicate_entry(sha1, exclude, &index_pos))
> @@ -1073,6 +1088,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1,
>  	if (have_duplicate_entry(sha1, 0, &index_pos))
>  		return 0;
>  
> +	if (!want_object_in_pack(sha1, 0, &pack, &offset))
> +		return 0;
> +

This part looks correct and easy to understand.

> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 3893afd..1a61de4 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -16,6 +16,7 @@ test_expect_success 'setup repo with moderate-sized history' '
>  		test_commit side-$i
>  	done &&
>  	git checkout master &&
> +	bitmaptip=$(git show-ref -s master) &&

Our usual method for getting a sha1 is "git rev-parse". I don't think
there's anything wrong with your method, but it might be better to stick
to the canonical one (I had to actually look up "show-ref -s").

> @@ -118,6 +119,90 @@ test_expect_success 'incremental repack can disable bitmaps' '
>  	git repack -d --no-write-bitmap-index
>  '
>  
> +test_expect_success 'pack-objects respects --local (non-local loose)' '
> +	mkdir -p alt_objects/pack &&
> +	echo $(pwd)/alt_objects > .git/objects/info/alternates &&
> +	echo content1 > file1 &&

Style: we don't put a space between ">" and the filename.

> +	objsha1=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file1) &&
> +	git cat-file blob $blob | GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w --stdin &&

I'm not sure why we need two objects in the fake alt_objects repository.
Shouldn't one be enough to do the test?

> +	git add file1 &&

I think this will actually skip the writing of the loose object, because
it's already available in the alternate object store. You probably want
to do this before adding it there.

> +	test_tick &&
> +	git commit -m commit_file1 &&
> +	echo HEAD | \

No need for "\" after a "|"; the shell knows it has to keep looking.

> +	git pack-objects --local --stdout --revs >1.pack &&
> +	git index-pack 1.pack &&

I'd have expected you to use the non-stdout version here. Is this meant
to be independent of your other patch (I think that's OK).

> +	git verify-pack -v 1.pack >1.objects &&

It's cheaper to use "git show-index <1.pack", and the output is saner,
too.

> +	echo -e "$objsha1\n$blob" >nonlocal-loose &&

"echo -e" isn't portable. You can use "printf", or two echos like:

  {
    echo one &&
    echo two
  } >file

(though I'm still not sure what we gain by checking both).

> +	if grep -qFf nonlocal-loose 1.objects; then
> +		echo "Non-local object present in pack generated with --local"
> +		return 1
> +	fi
> +'

grep -f isn't portable. However, I think:

  echo $objsha1 >expect &&
  git show-index <1.pack | cut -d' ' -f2 >actual
  test_cmp expect actual

would work (if you do stick with two entries, you might need to sort
your "expect").

I think similar comments apply to the other tests. I would have expected
"respects --local (non-local pack)" to come next (i.e., to keep all of
the --local tests together). But you seem to interleave them with
--honor-pack-keep.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]