Re: [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 05, 2023 at 02:13:45PM -0400, Derrick Stolee wrote:
> On 5/5/2023 1:27 PM, Taylor Blau wrote:
>
> > +static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
> > +					    struct rev_info *revs,
> > +					    struct object_list *roots)
> > +{
> > +	struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git };
> > +	unsigned int i;
> > +	unsigned int tmp_blobs, tmp_trees, tmp_tags;
> > +	int any_missing = 0;
> > +
> > +	revs->ignore_missing_links = 1;
> > +
> > +	/*
> > +	 * OR in any existing reachability bitmaps among `roots` into `base`.
> > +	 */
> > +	while (roots) {
> > +		struct object *object = roots->item;
> > +		roots = roots->next;
> > +
> > +		if (object->type == OBJ_COMMIT &&
> > +		    !obj_in_bitmap(bitmap_git, object, cb.base) &&
> > +		    add_commit_to_bitmap(bitmap_git, &cb.base,
> > +					 (struct commit *)object)) {
> > +			object->flags |= SEEN;
> > +			continue;
> > +		}
> > +
> > +		any_missing = 1;
> > +	}
> > +
> > +	if (!any_missing)
> > +		goto cleanup;
>
> Here, we check if the list of roots are completely covered by existing
> bitmaps. This prevents doing the commit-only walk as well as the boundary
> checks.
>
> There's another confusing bit here: if we have a bitmap for the commit,
> then we mark it as SEEN. Does that mean that the later commit walk will
> skip walking its history? Would we then get a boundary that is actually
> further in history than necessary? ("A --not B C" would walk all of
> B..A if C has a bitmap, even if a lot of that region is reachable from C.)
>
> My initial thought here was that this is an unlikely case, so the
> optimization isn't worth the code complexity. But now, I'm a little
> concerned that it actually hurts the later walk in the case of multiple
> roots.

The point about commits with existing bitmaps as SEEN hurting the
boundary traversal is a good one that I hadn't considered. I agree with
what you're saying here, though I think that the optimization still
makes sense (without touching the SEEN bit).

If all of the haves have existing bitmaps, we don't want to bother
starting the boundary traversal at all, since we can just OR those
together and be done. So I think we'd just want to drop the SEEN bits
there, but let me know if I am missing something.

> > +	revs->boundary = 0;
> > +	revs->blob_objects = tmp_blobs;
> > +	revs->tree_objects = tmp_trees;
> > +	revs->tag_objects = tmp_tags;
>
> These would seem more natural if they were after the trace2_region_leave().

Good suggestion, thanks.

> > +			} else {
> > +				add_pending_object(revs, obj, "");
> > +				needs_walk = 1;
> > +			}
> > +		}
> > +
> > +		if (needs_walk)
> > +			cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
>
> I wonder if fill_in_bitmap() already does "the right thing" when there
> are no pending objects or when all pending objects are already in the
> bitmap. Do we need to do these checks, or should we just put all boundary
> commits in the pending set?

Great suggestion. It does do the right thing via its `should_include`
checks, which see if the commit we're trying to process is already in
the bitmap, and if so, propagates SEEN to it and its parents.

Here's a diff that I generated after reading through this message, let
me know if you think I'm missing anything:

--- >8 ---
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 634fc10156..f57b74bcc0 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1141,7 +1141,6 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 		    !obj_in_bitmap(bitmap_git, object, cb.base) &&
 		    add_commit_to_bitmap(bitmap_git, &cb.base,
 					 (struct commit *)object)) {
-			object->flags |= SEEN;
 			continue;
 		}

@@ -1175,13 +1174,14 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 				      show_boundary_object,
 				      &cb, NULL);
 	revs->boundary = 0;
+	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
+
 	revs->blob_objects = tmp_blobs;
 	revs->tree_objects = tmp_trees;
 	revs->tag_objects = tmp_tags;

 	reset_revision_walk();
 	clear_object_flags(UNINTERESTING);
-	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);

 	/*
 	 * Then add the boundary commit(s) as fill-in traversal tips.
@@ -1189,26 +1189,21 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
 	if (cb.boundary.nr) {
 		struct object *obj;
-		int needs_walk = 0;
-
 		for (i = 0; i < cb.boundary.nr; i++) {
 			obj = cb.boundary.objects[i].item;

-			if (obj_in_bitmap(bitmap_git, obj, cb.base)) {
+			if (obj_in_bitmap(bitmap_git, obj, cb.base))
 				obj->flags |= SEEN;
-			} else {
+			else
 				add_pending_object(revs, obj, "");
-				needs_walk = 1;
-			}
 		}

-		if (needs_walk)
-			cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
+		cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
 	}
 	trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);

-
 cleanup:
+	object_array_clear(&cb.boundary);
 	revs->ignore_missing_links = 0;

 	return cb.base;
--- 8< ---

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux