This series is based off of 'jt/path-filter-fix'. These few patches implement an idea that we discussed in [1], where we attempt to reuse existing Bloom filters during an upgrade from v1 to v2 Bloom filters while rewriting the commit-graph. The core idea is that Bloom filters are reusable when there aren't any non-ASCII paths in a commit's tree-diff against its first parent (or the empty tree, if none exists). If we assume that a commit's tree-diff meets those conditions, we can't conclude anything about whether either tree contains non-ASCII characters, since they could be unmodified on either side and thus excluded from the tree-diff. But assuming the RHS (that there aren't any non-ASCII characters present in the tree's path set) *does* give us that there aren't any such paths present in the first-parent tree diff, either. This series checks whether or not commits meet that criteria, and reuses the existing Bloom filter (if one exists) when possible. In practice, we end up visiting relatively few trees, since we mark trees we've already visited. On both linux.git and git.git, this series gives a significant speed-up when upgrading Bloom filters from v1 to v2. On linux.git: Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] Range (min … max): 124.621 s … 125.227 s 3 runs Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] Range (min … max): 79.112 s … 79.437 s 3 runs Summary 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' On git.git (where we do have some non-ASCII paths), the change goes from 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. I'm sending this as an RFC, since we are in the middle of the -rc phase, and 'jt/path-filter-fix' isn't expected[2] to merge into 'master' until we're on the other side of 2.42. The structure of this series is as follows: - The first three patches prepare to load the `BDAT` chunk, even when the graph's Bloom filter settings are incompatible with the value in `commitGraph.changedPathsVersion`. - The fourth patch begins loading `BDAT` chunks unconditionally. - The fifth patch is a clean-up. - The sixth and final patch implements the approach discussed above. Thanks in advance for your thoughts and review :-). [1]: https://lore.kernel.org/git/ZMKvsObx+uaKA8zF@nand.local/ [2]: https://lore.kernel.org/git/xmqqy1it6ykm.fsf@gitster.g/ Taylor Blau (6): bloom: annotate filters with hash version bloom: prepare to discard incompatible Bloom filters t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` commit-graph.c: unconditionally load Bloom filters object.h: fix mis-aligned flag bits table commit-graph: reuse existing Bloom filters where possible bloom.c | 117 +++++++++++++++++++++++++++++++++++++++++-- bloom.h | 22 +++++++- commit-graph.c | 24 +++++---- object.h | 3 +- t/t4216-log-bloom.sh | 49 ++++++++++++++++-- 5 files changed, 195 insertions(+), 20 deletions(-) -- 2.41.0.407.g6d1c33951b