Here's an updated v3 of mine, Stolee, and Peff's series to improve the CPU performance of generating reachability bitmaps. Not a great deal has changed since last time, though this version does incorporate feedback from Jonathan Tan and Junio (thanks, both, for your review!). A range-diff is below for convenience, but the major highlights are: - ALLOC_GROW() is now used in more places (this didn't measurably affect peak-heap usage, so it's a pure nicetie to avoid duplicating that logic throughout the ewah code) - Some later commits have been reworded to add additional clarity - bitmap_diff_nonzero() was replaced with bitmap_is_subset(), and the implementation amended to follow Junio's suggestion - The final patches have been slightly modified to avoid allocating extra bits in the bitmasks for cases where reachability bitmaps have already been generated for those patches (suggestion courtesy of Jonathan Tan) I'm hopeful that this will be in good shape for queuing up, since Jonathan had a chance to review the whole series. Thanks! Derrick Stolee (9): pack-bitmap-write: fill bitmap with commit history bitmap: implement bitmap_is_subset() commit: implement commit_list_contains() t5310: add branch-based checks pack-bitmap-write: rename children to reverse_edges pack-bitmap-write: build fewer intermediate bitmaps pack-bitmap-write: use existing bitmaps pack-bitmap-write: relax unique rewalk condition pack-bitmap-write: better reuse bitmaps Jeff King (11): pack-bitmap: fix header size check pack-bitmap: bounds-check size of cache extension t5310: drop size of truncated ewah bitmap rev-list: die when --test-bitmap detects a mismatch ewah: factor out bitmap growth ewah: make bitmap growth less aggressive ewah: implement bitmap_or() ewah: add bitmap_dup() function pack-bitmap-write: reimplement bitmap writing pack-bitmap-write: pass ownership of intermediate bitmaps pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau (4): ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() pack-bitmap.c: check reads more aggressively when loading pack-bitmap: factor out 'bitmap_for_commit()' pack-bitmap: factor out 'add_commit_to_bitmap()' builtin/pack-objects.c | 1 - commit.c | 11 + commit.h | 2 + ewah/bitmap.c | 54 ++++- ewah/ewah_bitmap.c | 15 +- ewah/ewok.h | 3 +- pack-bitmap-write.c | 474 ++++++++++++++++++++++++++-------------- pack-bitmap.c | 139 ++++++------ pack-bitmap.h | 8 +- t/t5310-pack-bitmaps.sh | 164 +++++++++++--- 10 files changed, 576 insertions(+), 295 deletions(-) Range-diff against v2: 1: 07054ff8ee < -: ---------- ewah/ewah_bitmap.c: grow buffer past 1 -: ---------- > 1: 0b25ba4ca7 ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() 2: 74a13b4a6e = 2: b455b248e4 pack-bitmap: fix header size check 3: db11116dac = 3: 7322427444 pack-bitmap: bounds-check size of cache extension 4: f779e76f82 = 4: 055bc1fe66 t5310: drop size of truncated ewah bitmap 5: 1a9ac1c4ae = 5: c99cacea67 rev-list: die when --test-bitmap detects a mismatch 6: 9bb1ea3b19 = 6: b79360383e ewah: factor out bitmap growth 7: f8426c7e8b < -: ---------- ewah: make bitmap growth less aggressive -: ---------- > 7: 4b56f12932 ewah: make bitmap growth less aggressive 8: 674e31f98e ! 8: 34137a7f35 ewah: implement bitmap_or() @@ Commit message Interestingly, we have a public header declaration going back to e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the - function was never implemented. + function was never implemented. That was all OK since there were no + users of 'bitmap_or()', but a first caller will be added in a couple of + patches. Signed-off-by: Jeff King <peff@xxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> 9: a903c949d8 ! 9: fe89f87716 ewah: add bitmap_dup() function @@ ewah/bitmap.c: struct bitmap *bitmap_new(void) + static void bitmap_grow(struct bitmap *self, size_t word_alloc) { - if (word_alloc > self->word_alloc) { + size_t old_size = self->word_alloc; ## ewah/ewok.h ## @@ ewah/ewok.h: struct bitmap { 10: c951206729 ! 10: 91cd8b1a49 pack-bitmap-write: reimplement bitmap writing @@ pack-bitmap-write.c: static void compute_xor_offsets(void) - kh_value(writer.bitmaps, hash_pos) = stored; - display_progress(writer.progress, writer.selected_nr - i); + trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", -+ the_repository); ++ the_repository); + + bitmap_builder_init(&bb, &writer); + for (i = bb.commits_nr; i > 0; i--) { @@ pack-bitmap-write.c: static void compute_xor_offsets(void) + ent->bitmap = NULL; } + bitmap_builder_clear(&bb); ++ ++ trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", ++ the_repository); - bitmap_free(base); stop_progress(&writer.progress); 11: 466dd3036a = 11: 64598024ec pack-bitmap-write: pass ownership of intermediate bitmaps 12: 8e5607929d ! 12: 93fc437a3c pack-bitmap-write: fill bitmap with commit history @@ Metadata ## Commit message ## pack-bitmap-write: fill bitmap with commit history - The fill_bitmap_commit() method assumes that every parent of the given - commit is already part of the current bitmap. Instead of making that - assumption, let's walk parents until we reach commits already part of - the bitmap. Set the value for that parent immediately after querying to - save time doing double calls to find_object_pos() and to avoid inserting - the parent into the queue multiple times. + The current implementation of bitmap_writer_build() creates a + reachability bitmap for every walked commit. After computing a bitmap + for a commit, those bits are pushed to an in-progress bitmap for its + children. + + fill_bitmap_commit() assumes the bits corresponding to objects + reachable from the parents of a commit are already set. This means that + when visiting a new commit, we only have to walk the objects reachable + between it and any of its parents. + + A future change to bitmap_writer_build() will relax this condition so + not all parents have their bits set. Prepare for that by having + 'fill_bitmap_commit()' walk parents until reaching commits whose bits + are already set. Then, walk the trees for these commits as well. + + This has no functional change with the current implementation of + bitmap_writer_build(). Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> @@ pack-bitmap-write.c: void bitmap_writer_build(struct packing_data *to_pack) + clear_prio_queue(&queue); bitmap_builder_clear(&bb); - stop_progress(&writer.progress); + trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", 13: 4840c64c51 ! 13: 0d5213ba44 bitmap: add bitmap_diff_nonzero() @@ Metadata Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx> ## Commit message ## - bitmap: add bitmap_diff_nonzero() + bitmap: implement bitmap_is_subset() - The bitmap_diff_nonzero() checks if the 'self' bitmap contains any bits - that are not on in the 'other' bitmap. - - Also, delete the declaration of bitmap_is_subset() as it is not used or - implemented. + The bitmap_is_subset() function checks if the 'self' bitmap contains any + bitmaps that are not on in the 'other' bitmap. Up until this patch, it + had a declaration, but no implementation or callers. A subsequent patch + will want this function, so implement it here. + Helped-by: Junio C Hamano <gitster@xxxxxxxxx> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> @@ ewah/bitmap.c: int bitmap_equals(struct bitmap *self, struct bitmap *other) return 1; } -+int bitmap_diff_nonzero(struct bitmap *self, struct bitmap *other) ++int bitmap_is_subset(struct bitmap *self, struct bitmap *other) +{ -+ struct bitmap *small; -+ size_t i; ++ size_t common_size, i; + -+ if (self->word_alloc < other->word_alloc) { -+ small = self; -+ } else { -+ small = other; -+ -+ for (i = other->word_alloc; i < self->word_alloc; i++) { -+ if (self->words[i] != 0) ++ if (self->word_alloc < other->word_alloc) ++ common_size = self->word_alloc; ++ else { ++ common_size = other->word_alloc; ++ for (i = common_size; i < self->word_alloc; i++) { ++ if (self->words[i]) + return 1; + } + } + -+ for (i = 0; i < small->word_alloc; i++) { -+ if ((self->words[i] & ~other->words[i])) ++ for (i = 0; i < common_size; i++) { ++ if (self->words[i] & ~other->words[i]) + return 1; + } -+ + return 0; +} + @@ ewah/ewok.h: int bitmap_get(struct bitmap *self, size_t pos); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); -int bitmap_is_subset(struct bitmap *self, struct bitmap *super); -+int bitmap_diff_nonzero(struct bitmap *self, struct bitmap *other); ++int bitmap_is_subset(struct bitmap *self, struct bitmap *other); struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); 14: 63e846f4e8 = 14: 72e745fed8 commit: implement commit_list_contains() 15: 8b5d239333 = 15: c2cae4a8d0 t5310: add branch-based checks 16: 60a46091bb = 16: c0e2b6f5d9 pack-bitmap-write: rename children to reverse_edges 17: 8f7bb2dd2e = 17: 37f9636098 pack-bitmap.c: check reads more aggressively when loading 18: 5262daa330 ! 18: e520c8fdc4 pack-bitmap-write: build fewer intermediate bitmaps @@ pack-bitmap-write.c: static void bitmap_builder_init(struct bitmap_builder *bb, + c_not_p = 1; + p_not_c = 0; + } else { -+ c_not_p = bitmap_diff_nonzero(c_ent->commit_mask, p_ent->commit_mask); -+ p_not_c = bitmap_diff_nonzero(p_ent->commit_mask, c_ent->commit_mask); ++ c_not_p = bitmap_is_subset(c_ent->commit_mask, p_ent->commit_mask); ++ p_not_c = bitmap_is_subset(p_ent->commit_mask, c_ent->commit_mask); + } + + if (!c_not_p) @@ t/t5310-pack-bitmaps.sh: has_any () { +# | / \________________________ | +# |/ \| +# (l2) * * (r2) -+# \____________...____________ | ++# \___________________________ | +# \| +# * (base) +# @@ t/t5310-pack-bitmaps.sh: test_expect_success 'setup repo with moderate-sized his ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' -@@ t/t5310-pack-bitmaps.sh: test_expect_success 'truncated bitmap fails gracefully (ewah)' ' - git rev-list --use-bitmap-index --count --all >expect && - bitmap=$(ls .git/objects/pack/*.bitmap) && - test_when_finished "rm -f $bitmap" && -- test_copy_bytes 256 <$bitmap >$bitmap.tmp && -+ test_copy_bytes 270 <$bitmap >$bitmap.tmp && - mv -f $bitmap.tmp $bitmap && - git rev-list --use-bitmap-index --count --all >actual 2>stderr && - test_cmp expect actual && 19: a206f48614 = 19: c3975fcf78 pack-bitmap-write: ignore BITMAP_FLAG_REUSE 20: 9928b3c7da = 20: d5ef2c7f81 pack-bitmap: factor out 'bitmap_for_commit()' 21: f40a39a48a = 21: f0500190f0 pack-bitmap: factor out 'add_commit_to_bitmap()' 22: 4bf5e78a54 ! 22: c6fde2b0c4 pack-bitmap-write: use existing bitmaps @@ Commit message In fill_bitmap_commit(), we must reorder thing somewhat. The priority queue walks commits from newest-to-oldest, which means we correctly stop walking when reaching a commit with a bitmap. However, if we walk trees - from top to bottom, then we might be parsing trees that are actually - part of a re-used bitmap. To avoid over-walking trees, add them to a - LIFO queue and walk them from bottom-to-top after exploring commits - completely. + interleaved with the commits, then we might be parsing trees that are + actually part of a re-used bitmap. To avoid over-walking trees, add them + to a LIFO queue and walk them after exploring commits completely. On git.git, this reduces a second immediate bitmap computation from 2.0s to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork @@ pack-bitmap-write.c: static void fill_bitmap_tree(struct bitmap *bitmap, struct commit_list *p; struct commit *c = prio_queue_get(queue); -+ /* -+ * If this commit has an old bitmap, then translate that -+ * bitmap and add its bits to this one. No need to walk -+ * parents or the tree for this commit. -+ */ + if (old_bitmap && mapping) { -+ struct ewah_bitmap *old; -+ -+ old = bitmap_for_commit(old_bitmap, c); ++ struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c); ++ /* ++ * If this commit has an old bitmap, then translate that ++ * bitmap and add its bits to this one. No need to walk ++ * parents or the tree for this commit. ++ */ + if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) + continue; + } @@ pack-bitmap-write.c: void bitmap_writer_build(struct packing_data *to_pack) writer.to_pack = to_pack; @@ pack-bitmap-write.c: void bitmap_writer_build(struct packing_data *to_pack) trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", - the_repository); + the_repository); + old_bitmap = prepare_bitmap_git(to_pack->repo); + if (old_bitmap) @@ pack-bitmap-write.c: void bitmap_writer_build(struct packing_data *to_pack) bitmap_builder_clear(&bb); + free(mapping); - stop_progress(&writer.progress); - + trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", + the_repository); 23: 1da4fa0fb8 ! 23: 50d2031deb pack-bitmap-write: relax unique rewalk condition @@ Commit message | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- original | 64.044 | 83.241 | 2.088 | 2.194 | - last patch | 44.811 | 27.828 | 2.289 | 2.358 | - this patch | 100.641 | 35.560 | 2.152 | 2.224 | + last patch | 45.049 | 37.624 | 2.267 | 2.334 | + this patch | 88.478 | 53.218 | 2.157 | 2.224 | Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> 24: 42399a1c2e < -: ---------- pack-bitmap-write: better reuse bitmaps -: ---------- > 24: 6b9950771e pack-bitmap-write: better reuse bitmaps -- 2.29.2.533.g07db1f5344