On Fri, May 05, 2023 at 02:13:45PM -0400, Derrick Stolee wrote: > On 5/5/2023 1:27 PM, Taylor Blau wrote: > > > +static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, > > + struct rev_info *revs, > > + struct object_list *roots) > > +{ > > + struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git }; > > + unsigned int i; > > + unsigned int tmp_blobs, tmp_trees, tmp_tags; > > + int any_missing = 0; > > + > > + revs->ignore_missing_links = 1; > > + > > + /* > > + * OR in any existing reachability bitmaps among `roots` into `base`. > > + */ > > + while (roots) { > > + struct object *object = roots->item; > > + roots = roots->next; > > + > > + if (object->type == OBJ_COMMIT && > > + !obj_in_bitmap(bitmap_git, object, cb.base) && > > + add_commit_to_bitmap(bitmap_git, &cb.base, > > + (struct commit *)object)) { > > + object->flags |= SEEN; > > + continue; > > + } > > + > > + any_missing = 1; > > + } > > + > > + if (!any_missing) > > + goto cleanup; > > Here, we check if the list of roots are completely covered by existing > bitmaps. This prevents doing the commit-only walk as well as the boundary > checks. > > There's another confusing bit here: if we have a bitmap for the commit, > then we mark it as SEEN. Does that mean that the later commit walk will > skip walking its history? Would we then get a boundary that is actually > further in history than necessary? ("A --not B C" would walk all of > B..A if C has a bitmap, even if a lot of that region is reachable from C.) > > My initial thought here was that this is an unlikely case, so the > optimization isn't worth the code complexity. But now, I'm a little > concerned that it actually hurts the later walk in the case of multiple > roots. The point about commits with existing bitmaps as SEEN hurting the boundary traversal is a good one that I hadn't considered. I agree with what you're saying here, though I think that the optimization still makes sense (without touching the SEEN bit). If all of the haves have existing bitmaps, we don't want to bother starting the boundary traversal at all, since we can just OR those together and be done. So I think we'd just want to drop the SEEN bits there, but let me know if I am missing something. > > + revs->boundary = 0; > > + revs->blob_objects = tmp_blobs; > > + revs->tree_objects = tmp_trees; > > + revs->tag_objects = tmp_tags; > > These would seem more natural if they were after the trace2_region_leave(). Good suggestion, thanks. > > + } else { > > + add_pending_object(revs, obj, ""); > > + needs_walk = 1; > > + } > > + } > > + > > + if (needs_walk) > > + cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL); > > I wonder if fill_in_bitmap() already does "the right thing" when there > are no pending objects or when all pending objects are already in the > bitmap. Do we need to do these checks, or should we just put all boundary > commits in the pending set? Great suggestion. It does do the right thing via its `should_include` checks, which see if the commit we're trying to process is already in the bitmap, and if so, propagates SEEN to it and its parents. Here's a diff that I generated after reading through this message, let me know if you think I'm missing anything: --- >8 --- diff --git a/pack-bitmap.c b/pack-bitmap.c index 634fc10156..f57b74bcc0 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1141,7 +1141,6 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, !obj_in_bitmap(bitmap_git, object, cb.base) && add_commit_to_bitmap(bitmap_git, &cb.base, (struct commit *)object)) { - object->flags |= SEEN; continue; } @@ -1175,13 +1174,14 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, show_boundary_object, &cb, NULL); revs->boundary = 0; + trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository); + revs->blob_objects = tmp_blobs; revs->tree_objects = tmp_trees; revs->tag_objects = tmp_tags; reset_revision_walk(); clear_object_flags(UNINTERESTING); - trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository); /* * Then add the boundary commit(s) as fill-in traversal tips. @@ -1189,26 +1189,21 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository); if (cb.boundary.nr) { struct object *obj; - int needs_walk = 0; - for (i = 0; i < cb.boundary.nr; i++) { obj = cb.boundary.objects[i].item; - if (obj_in_bitmap(bitmap_git, obj, cb.base)) { + if (obj_in_bitmap(bitmap_git, obj, cb.base)) obj->flags |= SEEN; - } else { + else add_pending_object(revs, obj, ""); - needs_walk = 1; - } } - if (needs_walk) - cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL); + cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL); } trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository); - cleanup: + object_array_clear(&cb.boundary); revs->ignore_missing_links = 0; return cb.base; --- 8< --- Thanks, Taylor