[PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here is a reroll of my series to implement a boundary-based bitmap
traversal algorithm that I worked on towards the end of 2021 with Peff.

The algorithm is unchanged from the last version, but the implementation
has been made much more straightforward, thanks to a very helpful
suggestion from Stolee.

Instead of hackily trying to write down all of the UNINTERESTING commits
between the tips and boundary within limit_list(), we can just perform a
commit-only walk, which will give us the set of commits that we need.

Thanks in advance for your review.

Taylor Blau (2):
  pack-bitmap.c: extract `fill_in_bitmap()`
  pack-bitmap.c: use commit boundary during bitmap traversal

 pack-bitmap.c | 249 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 188 insertions(+), 61 deletions(-)

Range-diff against v1:
1:  a643678c0f < -:  ---------- revision: support tracking uninteresting commits
2:  7624d3b5ba ! 1:  a3a1522439 pack-bitmap.c: extract `fill_in_bitmap()`
    @@ pack-bitmap.c: static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
     +	struct include_data incdata;
     +	struct bitmap_show_data show_data;
     +
    -+	if (base == NULL)
    ++	if (!base)
     +		base = bitmap_new();
     +
     +	incdata.bitmap_git = bitmap_git;
    @@ pack-bitmap.c: static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
     +	revs->include_check_data = &incdata;
     +
     +	if (prepare_revision_walk(revs))
    -+		die("revision walk setup failed");
    ++		die(_("revision walk setup failed"));
     +
     +	show_data.bitmap_git = bitmap_git;
     +	show_data.base = base;
     +
    -+	traverse_commit_list_filtered(revs, show_commit, show_object,
    -+				      &show_data, NULL);
    ++	traverse_commit_list(revs, show_commit, show_object, &show_data);
     +
     +	revs->include_check = NULL;
     +	revs->include_check_obj = NULL;
3:  91ed8c70f2 ! 2:  1993af00cb pack-bitmap.c: use commit boundary during bitmap traversal
    @@ Commit message
         different (and hopefully faster) traversal to compute revision walks.
     
         Consider a set of positive and negative tips (which we'll refer to with
    -    their standard bitmap parlance by, "wants", and "haves"). In order to
    +    their standard bitmap parlance by "wants", and "haves"). In order to
         figure out what objects exist between the tips, the existing traversal
    -    in prepare_bitmap_walk() looks something like:
    +    in `prepare_bitmap_walk()` does something like:
     
           1. Consider if we can even compute the set of objects with bitmaps,
              and fall back to the usual traversal if we cannot. For example,
    @@ Commit message
              respectively.
     
           3. Fall back to the ordinary object traversal if either (a) there are
    -         no haves, (b) none of the haves are in the bitmapped pack or MIDX,
    -         or (c) there are no wants.
    +         more than zero haves, none of which are in the bitmapped pack or
    +         MIDX, or (b) there are no wants.
     
           4. Construct a reachability bitmap for the "haves" side by walking
              from the revision tips down to any existing bitmaps, OR-ing in any
    @@ Commit message
         And is more-or-less equivalent to using the *old* algorithm with this
         invocation:
     
    -        $ git rev-list --objects --boundary $WANTS --not $HAVES |
    -            perl -lne 'print $1 if /^-(.*)/' |
    -            xargs git rev-list --objects --use-bitmap-index $WANTS --not
    +        $ git rev-list --objects --use-bitmap-index $WANTS --not \
    +            $(git rev-list --objects --boundary $WANTS --not $HAVES |
    +              perl -lne 'print $1 if /^-(.*)/')
     
         The new result performs significantly better in many cases, particularly
         when the distance from the boundary commit(s) to an existing bitmap is
    @@ Commit message
             # (Computing objects unique to 'master' among all branches, not
             # using bitmaps).
             $ time git rev-list --count --objects master --not --exclude=master --branches
    -        54
    +        43
     
    -        real        0m1.012s
    -        user        0m0.796s
    -        sys 0m0.214s
    +        real        0m0.346s
    +        user        0m0.231s
    +        sys 0m0.115s
     
             # (Computing the same uniqueness query using the old bitmap
             # traversal algorithm.)
             $ time git rev-list --use-bitmap-index --count --objects master --not --exclude=master --branches
    -        42
    +        41
     
    -        real        0m29.571s
    -        user        0m28.404s
    -        sys 0m1.164s
    +        real        0m20.007s
    +        user        0m19.045s
    +        sys 0m0.960s
     
             # (Computing the same uniqueness query using the new bitmap
             # traversal algorithm.)
    -        $ time git.compile rev-list --count --objects --use-bitmap-index master --not --exclude=master --branches
    -        54
    +        $ time git.compile rev-list --use-bitmap-index --count --objects master --not --exclude=master --branches
    +        41
     
    -        real        0m2.279s
    -        user        0m2.023s
    -        sys 0m0.256s
    +        real        0m1.748s
    +        user        0m1.428s
    +        sys 0m0.320s
     
         The new algorithm is still slower than not using bitmaps at all, but it
    -    is nearly a 13-fold improvement over the existing traversal.
    +    is nearly a 11-fold improvement over the existing traversal.
     
         In a more realistic setting (using my local copy of git.git), I can
    -    observe a similar speed-up:
    +    observe a similar (if more modest) speed-up:
     
             $ ours="$(git branch --show-current)"
               argv="--count --objects $ours --not --exclude=$ours --branches"
    @@ Commit message
                 -n 'existing bitmap traversal' "git rev-list --use-bitmap-index $argv" \
                 -n 'this commit' "git.compile rev-list --use-bitmap-index $argv"
             Benchmark 1: no bitmaps
    -          Time (mean ± σ):      15.1 ms ±   4.1 ms    [User: 8.1 ms, System: 6.9 ms]
    -          Range (min … max):     7.4 ms …  21.8 ms    131 runs
    +          Time (mean ± σ):       5.8 ms ±   0.2 ms    [User: 3.3 ms, System: 2.4 ms]
    +          Range (min … max):     5.4 ms …   7.0 ms    409 runs
     
             Benchmark 2: existing bitmap traversal
    -          Time (mean ± σ):      82.6 ms ±   9.2 ms    [User: 63.6 ms, System: 19.0 ms]
    -          Range (min … max):    73.8 ms … 105.4 ms    28 runs
    +          Time (mean ± σ):      65.3 ms ±   0.6 ms    [User: 49.3 ms, System: 15.8 ms]
    +          Range (min … max):    64.1 ms …  66.9 ms    45 runs
     
             Benchmark 3: this commit
    -          Time (mean ± σ):      19.8 ms ±   3.1 ms    [User: 13.0 ms, System: 6.8 ms]
    -          Range (min … max):    17.7 ms …  38.6 ms    158 runs
    +          Time (mean ± σ):      19.8 ms ±   0.3 ms    [User: 12.9 ms, System: 6.8 ms]
    +          Range (min … max):    19.1 ms …  20.8 ms    150 runs
     
             Summary
               'no bitmaps' ran
    -            1.31 ± 0.41 times faster than 'this commit'
    -            5.49 ± 1.62 times faster than 'existing bitmap traversal'
    +            3.43 ± 0.14 times faster than 'this commit'
    +           11.34 ± 0.45 times faster than 'existing bitmap traversal'
     
         Here, the new algorithm is also still slower than not using bitmaps, but
    -    represents a 4-fold improvement over the existing traversal in a more
    -    modest example.
    +    represents a more than 3-fold improvement over the existing traversal in
    +    a more modest example.
     
         Since this algorithm was originally written (nearly a year and a half
         ago, at the time of writing), the bitmap lookup table shipped, making
    @@ Commit message
             fewer "summary" bitmaps (which would also help with the above).
     
         Helped-by: Jeff King <peff@xxxxxxxx>
    +    Helped-by: Derrick Stolee <derrickstolee@xxxxxxxxxx>
         Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
     
      ## pack-bitmap.c ##
    -@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git,
    - 	struct include_data incdata;
    - 	struct bitmap_show_data show_data;
    - 
    --	if (base == NULL)
    -+	if (!base)
    - 		base = bitmap_new();
    - 
    - 	incdata.bitmap_git = bitmap_git;
     @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git,
      	return base;
      }
    @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
     +	return bitmap_get(bitmap, pos);
     +}
     +
    -+static void show_boundary_commit(struct commit *commit, void *data)
    ++struct bitmap_boundary_cb {
    ++	struct bitmap_index *bitmap_git;
    ++	struct bitmap *base;
    ++
    ++	struct object_array boundary;
    ++};
    ++
    ++static void show_boundary_commit(struct commit *commit, void *_data)
     +{
    -+	struct object_array *boundary = data;
    -+	if (!(commit->object.flags & BOUNDARY))
    -+		return;
    ++	struct bitmap_boundary_cb *data = _data;
     +
    -+	add_object_array(&commit->object, "", boundary);
    ++	if (commit->object.flags & BOUNDARY)
    ++		add_object_array(&commit->object, "", &data->boundary);
    ++
    ++	if (commit->object.flags & UNINTERESTING) {
    ++		if (obj_in_bitmap(data->bitmap_git, &commit->object,
    ++				  data->base))
    ++			return;
    ++
    ++		add_commit_to_bitmap(data->bitmap_git, &data->base, commit);
    ++	}
     +}
     +
     +static void show_boundary_object(struct object *object,
    @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
     +					    struct rev_info *revs,
     +					    struct object_list *roots)
     +{
    -+	struct bitmap *base = NULL;
    -+	struct object_array boundary = OBJECT_ARRAY_INIT;
    -+	int any_missing = 0;
    ++	struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git };
     +	unsigned int i;
    -+	int tmp_blobs, tmp_trees, tmp_tags;
    ++	unsigned int tmp_blobs, tmp_trees, tmp_tags;
    ++	int any_missing = 0;
     +
     +	revs->ignore_missing_links = 1;
    -+	revs->collect_uninteresting = 1;
     +
     +	/*
     +	 * OR in any existing reachability bitmaps among `roots` into `base`.
    @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
     +		roots = roots->next;
     +
     +		if (object->type == OBJ_COMMIT &&
    -+		    !obj_in_bitmap(bitmap_git, object, base) &&
    -+		    add_commit_to_bitmap(bitmap_git, &base,
    ++		    !obj_in_bitmap(bitmap_git, object, cb.base) &&
    ++		    add_commit_to_bitmap(bitmap_git, &cb.base,
     +					 (struct commit *)object)) {
     +			object->flags |= SEEN;
     +			continue;
    @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
     +	revs->tag_objects = 0;
     +
     +	/*
    -+	 * We didn't have complete coverage of the roots. First OR in any
    -+	 * bitmaps that are UNINTERESTING between the tips and boundary.
    ++	 * We didn't have complete coverage of the roots. First setup a
    ++	 * revision walk to (a) OR in any bitmaps that are UNINTERESTING
    ++	 * between the tips and boundary, and (b) record the boundary.
     +	 */
     +	trace2_region_enter("pack-bitmap", "boundary-prepare", the_repository);
     +	if (prepare_revision_walk(revs))
     +		die("revision walk setup failed");
     +	trace2_region_leave("pack-bitmap", "boundary-prepare", the_repository);
     +
    -+	trace2_region_enter("pack-bitmap", "boundary-load-bitmaps", the_repository);
    -+	for (i = 0; i < revs->uninteresting_commits.nr; i++) {
    -+		struct object *obj = revs->uninteresting_commits.objects[i].item;
    -+		if (obj->type != OBJ_COMMIT)
    -+			BUG("unexpected non-commit %s marked uninteresting",
    -+			    oid_to_hex(&obj->oid));
    -+
    -+		if (obj_in_bitmap(bitmap_git, obj, base))
    -+			continue;
    -+
    -+		add_commit_to_bitmap(bitmap_git, &base, (struct commit *)obj);
    -+	}
    -+	trace2_region_leave("pack-bitmap", "boundary-load-bitmaps", the_repository);
    -+
    -+	/*
    -+	 * Then add the boundary commit(s) as fill-in traversal tips.
    -+	 */
     +	trace2_region_enter("pack-bitmap", "boundary-traverse", the_repository);
     +	revs->boundary = 1;
     +	traverse_commit_list_filtered(revs,
     +				      show_boundary_commit,
     +				      show_boundary_object,
    -+				      &boundary, NULL);
    ++				      &cb, NULL);
     +	revs->boundary = 0;
     +	revs->blob_objects = tmp_blobs;
     +	revs->tree_objects = tmp_trees;
    @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
     +	clear_object_flags(UNINTERESTING);
     +	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
     +
    ++	/*
    ++	 * Then add the boundary commit(s) as fill-in traversal tips.
    ++	 */
     +	trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
    -+	if (boundary.nr) {
    ++	if (cb.boundary.nr) {
     +		struct object *obj;
     +		int needs_walk = 0;
     +
    -+		for (i = 0; i < boundary.nr; i++) {
    -+			obj = boundary.objects[i].item;
    ++		for (i = 0; i < cb.boundary.nr; i++) {
    ++			obj = cb.boundary.objects[i].item;
     +
    -+			if (obj_in_bitmap(bitmap_git, obj, base)) {
    ++			if (obj_in_bitmap(bitmap_git, obj, cb.base)) {
     +				obj->flags |= SEEN;
     +			} else {
     +				add_pending_object(revs, obj, "");
    @@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
     +		}
     +
     +		if (needs_walk)
    -+			base = fill_in_bitmap(bitmap_git, revs, base, NULL);
    ++			cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
     +	}
     +	trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);
     +
     +
     +cleanup:
     +	revs->ignore_missing_links = 0;
    -+	revs->collect_uninteresting = 0;
     +
    -+	return base;
    ++	return cb.base;
     +}
     +
      static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
    @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      	if (!wants)
      		goto cleanup;
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
    - 	if (load_bitmap(bitmap_git) < 0)
    + 	if (load_bitmap(revs->repo, bitmap_git) < 0)
      		goto cleanup;
      
     -	object_array_clear(&revs->pending);
-- 
2.40.1.478.gcab26587e8



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux