Here is a reroll of my series to implement a boundary-based bitmap traversal algorithm that I worked on towards the end of 2021 with Peff. The algorithm is unchanged from the last version, but the implementation has been made much more straightforward, thanks to a very helpful suggestion from Stolee. Instead of hackily trying to write down all of the UNINTERESTING commits between the tips and boundary within limit_list(), we can just perform a commit-only walk, which will give us the set of commits that we need. Thanks in advance for your review. Taylor Blau (2): pack-bitmap.c: extract `fill_in_bitmap()` pack-bitmap.c: use commit boundary during bitmap traversal pack-bitmap.c | 249 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 188 insertions(+), 61 deletions(-) Range-diff against v1: 1: a643678c0f < -: ---------- revision: support tracking uninteresting commits 2: 7624d3b5ba ! 1: a3a1522439 pack-bitmap.c: extract `fill_in_bitmap()` @@ pack-bitmap.c: static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, + struct include_data incdata; + struct bitmap_show_data show_data; + -+ if (base == NULL) ++ if (!base) + base = bitmap_new(); + + incdata.bitmap_git = bitmap_git; @@ pack-bitmap.c: static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, + revs->include_check_data = &incdata; + + if (prepare_revision_walk(revs)) -+ die("revision walk setup failed"); ++ die(_("revision walk setup failed")); + + show_data.bitmap_git = bitmap_git; + show_data.base = base; + -+ traverse_commit_list_filtered(revs, show_commit, show_object, -+ &show_data, NULL); ++ traverse_commit_list(revs, show_commit, show_object, &show_data); + + revs->include_check = NULL; + revs->include_check_obj = NULL; 3: 91ed8c70f2 ! 2: 1993af00cb pack-bitmap.c: use commit boundary during bitmap traversal @@ Commit message different (and hopefully faster) traversal to compute revision walks. Consider a set of positive and negative tips (which we'll refer to with - their standard bitmap parlance by, "wants", and "haves"). In order to + their standard bitmap parlance by "wants", and "haves"). In order to figure out what objects exist between the tips, the existing traversal - in prepare_bitmap_walk() looks something like: + in `prepare_bitmap_walk()` does something like: 1. Consider if we can even compute the set of objects with bitmaps, and fall back to the usual traversal if we cannot. For example, @@ Commit message respectively. 3. Fall back to the ordinary object traversal if either (a) there are - no haves, (b) none of the haves are in the bitmapped pack or MIDX, - or (c) there are no wants. + more than zero haves, none of which are in the bitmapped pack or + MIDX, or (b) there are no wants. 4. Construct a reachability bitmap for the "haves" side by walking from the revision tips down to any existing bitmaps, OR-ing in any @@ Commit message And is more-or-less equivalent to using the *old* algorithm with this invocation: - $ git rev-list --objects --boundary $WANTS --not $HAVES | - perl -lne 'print $1 if /^-(.*)/' | - xargs git rev-list --objects --use-bitmap-index $WANTS --not + $ git rev-list --objects --use-bitmap-index $WANTS --not \ + $(git rev-list --objects --boundary $WANTS --not $HAVES | + perl -lne 'print $1 if /^-(.*)/') The new result performs significantly better in many cases, particularly when the distance from the boundary commit(s) to an existing bitmap is @@ Commit message # (Computing objects unique to 'master' among all branches, not # using bitmaps). $ time git rev-list --count --objects master --not --exclude=master --branches - 54 + 43 - real 0m1.012s - user 0m0.796s - sys 0m0.214s + real 0m0.346s + user 0m0.231s + sys 0m0.115s # (Computing the same uniqueness query using the old bitmap # traversal algorithm.) $ time git rev-list --use-bitmap-index --count --objects master --not --exclude=master --branches - 42 + 41 - real 0m29.571s - user 0m28.404s - sys 0m1.164s + real 0m20.007s + user 0m19.045s + sys 0m0.960s # (Computing the same uniqueness query using the new bitmap # traversal algorithm.) - $ time git.compile rev-list --count --objects --use-bitmap-index master --not --exclude=master --branches - 54 + $ time git.compile rev-list --use-bitmap-index --count --objects master --not --exclude=master --branches + 41 - real 0m2.279s - user 0m2.023s - sys 0m0.256s + real 0m1.748s + user 0m1.428s + sys 0m0.320s The new algorithm is still slower than not using bitmaps at all, but it - is nearly a 13-fold improvement over the existing traversal. + is nearly a 11-fold improvement over the existing traversal. In a more realistic setting (using my local copy of git.git), I can - observe a similar speed-up: + observe a similar (if more modest) speed-up: $ ours="$(git branch --show-current)" argv="--count --objects $ours --not --exclude=$ours --branches" @@ Commit message -n 'existing bitmap traversal' "git rev-list --use-bitmap-index $argv" \ -n 'this commit' "git.compile rev-list --use-bitmap-index $argv" Benchmark 1: no bitmaps - Time (mean ± σ): 15.1 ms ± 4.1 ms [User: 8.1 ms, System: 6.9 ms] - Range (min … max): 7.4 ms … 21.8 ms 131 runs + Time (mean ± σ): 5.8 ms ± 0.2 ms [User: 3.3 ms, System: 2.4 ms] + Range (min … max): 5.4 ms … 7.0 ms 409 runs Benchmark 2: existing bitmap traversal - Time (mean ± σ): 82.6 ms ± 9.2 ms [User: 63.6 ms, System: 19.0 ms] - Range (min … max): 73.8 ms … 105.4 ms 28 runs + Time (mean ± σ): 65.3 ms ± 0.6 ms [User: 49.3 ms, System: 15.8 ms] + Range (min … max): 64.1 ms … 66.9 ms 45 runs Benchmark 3: this commit - Time (mean ± σ): 19.8 ms ± 3.1 ms [User: 13.0 ms, System: 6.8 ms] - Range (min … max): 17.7 ms … 38.6 ms 158 runs + Time (mean ± σ): 19.8 ms ± 0.3 ms [User: 12.9 ms, System: 6.8 ms] + Range (min … max): 19.1 ms … 20.8 ms 150 runs Summary 'no bitmaps' ran - 1.31 ± 0.41 times faster than 'this commit' - 5.49 ± 1.62 times faster than 'existing bitmap traversal' + 3.43 ± 0.14 times faster than 'this commit' + 11.34 ± 0.45 times faster than 'existing bitmap traversal' Here, the new algorithm is also still slower than not using bitmaps, but - represents a 4-fold improvement over the existing traversal in a more - modest example. + represents a more than 3-fold improvement over the existing traversal in + a more modest example. Since this algorithm was originally written (nearly a year and a half ago, at the time of writing), the bitmap lookup table shipped, making @@ Commit message fewer "summary" bitmaps (which would also help with the above). Helped-by: Jeff King <peff@xxxxxxxx> + Helped-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> ## pack-bitmap.c ## -@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git, - struct include_data incdata; - struct bitmap_show_data show_data; - -- if (base == NULL) -+ if (!base) - base = bitmap_new(); - - incdata.bitmap_git = bitmap_git; @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git, return base; } @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_ + return bitmap_get(bitmap, pos); +} + -+static void show_boundary_commit(struct commit *commit, void *data) ++struct bitmap_boundary_cb { ++ struct bitmap_index *bitmap_git; ++ struct bitmap *base; ++ ++ struct object_array boundary; ++}; ++ ++static void show_boundary_commit(struct commit *commit, void *_data) +{ -+ struct object_array *boundary = data; -+ if (!(commit->object.flags & BOUNDARY)) -+ return; ++ struct bitmap_boundary_cb *data = _data; + -+ add_object_array(&commit->object, "", boundary); ++ if (commit->object.flags & BOUNDARY) ++ add_object_array(&commit->object, "", &data->boundary); ++ ++ if (commit->object.flags & UNINTERESTING) { ++ if (obj_in_bitmap(data->bitmap_git, &commit->object, ++ data->base)) ++ return; ++ ++ add_commit_to_bitmap(data->bitmap_git, &data->base, commit); ++ } +} + +static void show_boundary_object(struct object *object, @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_ + struct rev_info *revs, + struct object_list *roots) +{ -+ struct bitmap *base = NULL; -+ struct object_array boundary = OBJECT_ARRAY_INIT; -+ int any_missing = 0; ++ struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git }; + unsigned int i; -+ int tmp_blobs, tmp_trees, tmp_tags; ++ unsigned int tmp_blobs, tmp_trees, tmp_tags; ++ int any_missing = 0; + + revs->ignore_missing_links = 1; -+ revs->collect_uninteresting = 1; + + /* + * OR in any existing reachability bitmaps among `roots` into `base`. @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_ + roots = roots->next; + + if (object->type == OBJ_COMMIT && -+ !obj_in_bitmap(bitmap_git, object, base) && -+ add_commit_to_bitmap(bitmap_git, &base, ++ !obj_in_bitmap(bitmap_git, object, cb.base) && ++ add_commit_to_bitmap(bitmap_git, &cb.base, + (struct commit *)object)) { + object->flags |= SEEN; + continue; @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_ + revs->tag_objects = 0; + + /* -+ * We didn't have complete coverage of the roots. First OR in any -+ * bitmaps that are UNINTERESTING between the tips and boundary. ++ * We didn't have complete coverage of the roots. First setup a ++ * revision walk to (a) OR in any bitmaps that are UNINTERESTING ++ * between the tips and boundary, and (b) record the boundary. + */ + trace2_region_enter("pack-bitmap", "boundary-prepare", the_repository); + if (prepare_revision_walk(revs)) + die("revision walk setup failed"); + trace2_region_leave("pack-bitmap", "boundary-prepare", the_repository); + -+ trace2_region_enter("pack-bitmap", "boundary-load-bitmaps", the_repository); -+ for (i = 0; i < revs->uninteresting_commits.nr; i++) { -+ struct object *obj = revs->uninteresting_commits.objects[i].item; -+ if (obj->type != OBJ_COMMIT) -+ BUG("unexpected non-commit %s marked uninteresting", -+ oid_to_hex(&obj->oid)); -+ -+ if (obj_in_bitmap(bitmap_git, obj, base)) -+ continue; -+ -+ add_commit_to_bitmap(bitmap_git, &base, (struct commit *)obj); -+ } -+ trace2_region_leave("pack-bitmap", "boundary-load-bitmaps", the_repository); -+ -+ /* -+ * Then add the boundary commit(s) as fill-in traversal tips. -+ */ + trace2_region_enter("pack-bitmap", "boundary-traverse", the_repository); + revs->boundary = 1; + traverse_commit_list_filtered(revs, + show_boundary_commit, + show_boundary_object, -+ &boundary, NULL); ++ &cb, NULL); + revs->boundary = 0; + revs->blob_objects = tmp_blobs; + revs->tree_objects = tmp_trees; @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_ + clear_object_flags(UNINTERESTING); + trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository); + ++ /* ++ * Then add the boundary commit(s) as fill-in traversal tips. ++ */ + trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository); -+ if (boundary.nr) { ++ if (cb.boundary.nr) { + struct object *obj; + int needs_walk = 0; + -+ for (i = 0; i < boundary.nr; i++) { -+ obj = boundary.objects[i].item; ++ for (i = 0; i < cb.boundary.nr; i++) { ++ obj = cb.boundary.objects[i].item; + -+ if (obj_in_bitmap(bitmap_git, obj, base)) { ++ if (obj_in_bitmap(bitmap_git, obj, cb.base)) { + obj->flags |= SEEN; + } else { + add_pending_object(revs, obj, ""); @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_ + } + + if (needs_walk) -+ base = fill_in_bitmap(bitmap_git, revs, base, NULL); ++ cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL); + } + trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository); + + +cleanup: + revs->ignore_missing_links = 0; -+ revs->collect_uninteresting = 0; + -+ return base; ++ return cb.base; +} + static struct bitmap *find_objects(struct bitmap_index *bitmap_git, @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, if (!wants) goto cleanup; @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, - if (load_bitmap(bitmap_git) < 0) + if (load_bitmap(revs->repo, bitmap_git) < 0) goto cleanup; - object_array_clear(&revs->pending); -- 2.40.1.478.gcab26587e8