Re: [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/5/2023 1:27 PM, Taylor Blau wrote:

> +static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
> +					    struct rev_info *revs,
> +					    struct object_list *roots)
> +{
> +	struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git };
> +	unsigned int i;
> +	unsigned int tmp_blobs, tmp_trees, tmp_tags;
> +	int any_missing = 0;
> +
> +	revs->ignore_missing_links = 1;
> +
> +	/*
> +	 * OR in any existing reachability bitmaps among `roots` into `base`.
> +	 */
> +	while (roots) {
> +		struct object *object = roots->item;
> +		roots = roots->next;
> +
> +		if (object->type == OBJ_COMMIT &&
> +		    !obj_in_bitmap(bitmap_git, object, cb.base) &&
> +		    add_commit_to_bitmap(bitmap_git, &cb.base,
> +					 (struct commit *)object)) {
> +			object->flags |= SEEN;
> +			continue;
> +		}
> +
> +		any_missing = 1;
> +	}
> +
> +	if (!any_missing)
> +		goto cleanup;

Here, we check if the list of roots are completely covered by existing
bitmaps. This prevents doing the commit-only walk as well as the boundary
checks.

There's another confusing bit here: if we have a bitmap for the commit,
then we mark it as SEEN. Does that mean that the later commit walk will
skip walking its history? Would we then get a boundary that is actually
further in history than necessary? ("A --not B C" would walk all of
B..A if C has a bitmap, even if a lot of that region is reachable from C.)

My initial thought here was that this is an unlikely case, so the
optimization isn't worth the code complexity. But now, I'm a little
concerned that it actually hurts the later walk in the case of multiple
roots.

> +	tmp_blobs = revs->blob_objects;
> +	tmp_trees = revs->tree_objects;
> +	tmp_tags = revs->blob_objects;
> +	revs->blob_objects = 0;
> +	revs->tree_objects = 0;
> +	revs->tag_objects = 0;
> +
> +	/*
> +	 * We didn't have complete coverage of the roots. First setup a
> +	 * revision walk to (a) OR in any bitmaps that are UNINTERESTING
> +	 * between the tips and boundary, and (b) record the boundary.
> +	 */
> +	trace2_region_enter("pack-bitmap", "boundary-prepare", the_repository);
> +	if (prepare_revision_walk(revs))
> +		die("revision walk setup failed");
> +	trace2_region_leave("pack-bitmap", "boundary-prepare", the_repository);
> +
> +	trace2_region_enter("pack-bitmap", "boundary-traverse", the_repository);
> +	revs->boundary = 1;
> +	traverse_commit_list_filtered(revs,
> +				      show_boundary_commit,
> +				      show_boundary_object,
> +				      &cb, NULL);

Looks good. Callbacks were clear when I read them.

> +	revs->boundary = 0;
> +	revs->blob_objects = tmp_blobs;
> +	revs->tree_objects = tmp_trees;
> +	revs->tag_objects = tmp_tags;

These would seem more natural if they were after the trace2_region_leave().

> +	reset_revision_walk();
> +	clear_object_flags(UNINTERESTING);
> +	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
> +
> +	/*
> +	 * Then add the boundary commit(s) as fill-in traversal tips.
> +	 */
> +	trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
> +	if (cb.boundary.nr) {

Making this an if block does help keep its variables more local. It does
make me think about whether the trace2 regions should be within the block,
but it's easier to analyze things when the regions are expected to be present.

> +		struct object *obj;
> +		int needs_walk = 0;
> +
> +		for (i = 0; i < cb.boundary.nr; i++) {
> +			obj = cb.boundary.objects[i].item;
> +
> +			if (obj_in_bitmap(bitmap_git, obj, cb.base)) {
> +				obj->flags |= SEEN;

This SEEN makes sense: we want to terminate the walk here as everything
reachable is already in the bitmap.

> +			} else {
> +				add_pending_object(revs, obj, "");
> +				needs_walk = 1;
> +			}
> +		}
> +
> +		if (needs_walk)
> +			cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);

I wonder if fill_in_bitmap() already does "the right thing" when there
are no pending objects or when all pending objects are already in the
bitmap. Do we need to do these checks, or should we just put all boundary
commits in the pending set?

> +	}
> +	trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);
> +
> +

nit: extra empty line

> +cleanup:
> +	revs->ignore_missing_links = 0;
> +
> +	return cb.base;

Do we need to clean up the cb.boundary object array?

> +}
> +

...

> @@ -1605,18 +1728,15 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
>  	if (load_bitmap(revs->repo, bitmap_git) < 0)
>  		goto cleanup;
>  
> -	object_array_clear(&revs->pending);
> -
>  	if (haves) {
> -		revs->ignore_missing_links = 1;
> -		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
> -		reset_revision_walk();
> -		revs->ignore_missing_links = 0;
> -
> +		haves_bitmap = find_boundary_objects(bitmap_git, revs, haves);
>  		if (!haves_bitmap)
>  			BUG("failed to perform bitmap walk");
>  	}

I mentioned in reply to the cover letter that deleting this older algorithm
is likely premature at this point, and we might want to keep it in general
as an option.

> +	object_array_clear(&revs->pending);
> +	reset_revision_walk();
> +

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux