Re: [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 08, 2023 at 04:53:35PM -0400, Derrick Stolee wrote:
> On 5/8/2023 1:38 PM, Taylor Blau wrote:
>
> > -	/*
> > -	 * if we have a HAVES list, but none of those haves is contained
> > -	 * in the packfile that has a bitmap, we don't have anything to
> > -	 * optimize here
> > -	 */
> > -	if (haves && !in_bitmapped_pack(bitmap_git, haves))
> > -		goto cleanup;
> > +	use_boundary_traversal = git_env_bool(GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL, -1);
> > +	if (use_boundary_traversal < 0) {
> > +		prepare_repo_settings(revs->repo);
> > +		use_boundary_traversal = revs->repo->settings.pack_use_bitmap_boundary_traversal;
> > +	}
> > +
> > +	if (!use_boundary_traversal) {
> > +		/*
> > +		 * if we have a HAVES list, but none of those haves is contained
> > +		 * in the packfile that has a bitmap, we don't have anything to
> > +		 * optimize here
> > +		 */
> > +		if (haves && !in_bitmapped_pack(bitmap_git, haves))
> > +			goto cleanup;
> > +	}
>
> I was reading along, nodding my head, until I came across this comment.
> I recognize that it's moved from an existing place, but this seems very
> strange and incorrect.
>
> Is this implying that if the 'haves' are not in the bitmapped pack, then
> we _can't_ construct a bitmap representing the objects they can reach?
>
> Do we not have the ability to extend the object order beyond the bitmapped
> pack in-memory in a way that allows us to provide bit positions for the
> objects not in the bitmapped pack?

It is a strange heuristic indeed, and I had the same thought as you the
first time I remember encountering this part of the code.

in_bitmapped_pack() asks whether there is at least one of the given set
of objects which appears in the pack/MIDX corresponding with our bitmap.
In other words, this heuristic causes us to bail when none of the
objects listed as haves appear in the corresponding pack/MIDX.

Strictly speaking, there isn't a fundamental limitation preventing us
from using the bitmap traversal in cases where none of the tips exist in
the bitmapped pack/MIDX. That's what the extended index is for (see
`bitmap_position_extended()` and `ext_index_add_object()` for more
details).

AFAICT, the heuristic here is another way of saying, "if none of these
haves appear in the corresponding pack or MIDX, my bitmap coverage is
likely bad enough that it's not worth trying to construct a full bitmap
on the haves side (and we should fall back to the ordinary object
traversal)".

> I can understand saying "it might be more expensive to construct a bitmap
> here, because we need to walk into the bitmapped pack before we have a
> hope of hitting a bitmap." However, that's far from "we don't have anything
> to optimize".

Agreed.

> Something to think about removing in the future, but not a blocker for
> this series. Getting it out of the way for the boundary-based walk makes
> even more sense because the commits to check are those in the boundary,
> not the haves themselves.

Yeah. I think that what we do here will end up depending on the
performance of the boundary based approach. If it's a clear enough win
in a majority of cases, I'd just as soon drop the classic traversal (and
with it, this heuristic).

But if we do end up keeping the classic traversal around for a while, I
absolutely agree that we should investigate and consider dropping this
optimization, which I have long-suspected as not being very useful.

> > +test_expect_success 'boundary-based traversal is used when requested' '
> > +	git repack -a -d --write-bitmap-index &&
> > +
> > +	for argv in \
> > +		"git -c pack.useBitmapBoundaryTraversal=true" \
> > +		"git -c feature.experimental=true" \
> > +		"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1 git"
> > +	do
> > +		eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \
> > +			--use-bitmap-index second..other 2>perf" &&
> > +		grep "\"region_enter\".*\"label\":\"haves/boundary\"" perf ||
> > +			return 1
> > +	done &&
> > +
> > +	for argv in \
> > +		"git -c pack.useBitmapBoundaryTraversal=false" \
> > +		"git -c feature.experimental=true -c pack.useBitmapBoundaryTraversal=false" \
> > +		"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c pack.useBitmapBoundaryTraversal=true" \
> > +		"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c feature.experimental=true"
>
> This ordering (GIT_TEST_*=0 overrides config) seems backwards to me, but
> it doesn't really matter since it's a GIT_TEST_* variable. Thanks for
> including tests so the order is documented.

Now that you mention it, I think it's backwards too ;-). But I agree
that documenting it here is sufficient (and ideally, we don't have to
live with this GIT_TEST variable for more than a release or two,
anyway).

> > +	do
> > +		eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \
> > +			--use-bitmap-index second..other 2>perf" &&
> > +		grep "\"region_enter\".*\"label\":\"haves/classic\"" perf ||
> > +			return 1
>
> nit: you should be able to use 'test_region' here. Probably not worth
> a re-roll, as everything else looks good to me.

Ah, I didn't know of `test_region` before. I'll keep it in mind, thanks!

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux