(Rebased onto the tip of 'master', which is now 1e1586e4ed (The sixteenth batch, 2024-06-24) at the time of writing). This series is another minor reroll of the combined efforts of [1] and [2] to introduce the v2 changed-path Bloom filters, which fixes a bug in our existing implementation of murmur3 paths with non-ASCII characters (when the "char" type is signed). This version addresses the remaining comments from SZEDER around more thorough testing of merging commit-graph layers with incompatible Bloom filters versions, and ensuring the result is as expected. Thanks again to Jonathan, Peff, and SZEDER who have helped a great deal in assembling these patches. As usual, a range-diff is included below. Thanks in advance for your review! [1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@xxxxxxxxxx/ [2]: https://lore.kernel.org/git/cover.1691426160.git.me@xxxxxxxxxxxx/ Jonathan Tan (1): gitformat-commit-graph: describe version 2 of BDAT Taylor Blau (15): t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` revision.c: consult Bloom filters for root commits commit-graph: ensure Bloom filters are read with consistent settings t/helper/test-read-graph.c: extract `dump_graph_info()` bloom.h: make `load_bloom_filter_from_graph()` public t/helper/test-read-graph: implement `bloom-filters` mode t4216: test changed path filters with high bit paths repo-settings: introduce commitgraph.changedPathsVersion bloom: annotate filters with hash version bloom: prepare to discard incompatible Bloom filters commit-graph: unconditionally load Bloom filters commit-graph: new Bloom filter version that fixes murmur3 object.h: fix mis-aligned flag bits table commit-graph: reuse existing Bloom filters where possible bloom: introduce `deinit_bloom_filters()` Documentation/config/commitgraph.txt | 29 +- Documentation/gitformat-commit-graph.txt | 9 +- bloom.c | 208 ++++++++++++++- bloom.h | 38 ++- commit-graph.c | 64 ++++- object.h | 3 +- oss-fuzz/fuzz-commit-graph.c | 2 +- repo-settings.c | 6 +- repository.h | 2 +- revision.c | 26 +- t/helper/test-bloom.c | 9 +- t/helper/test-read-graph.c | 67 ++++- t/t0095-bloom.sh | 8 + t/t4216-log-bloom.sh | 325 ++++++++++++++++++++++- 14 files changed, 738 insertions(+), 58 deletions(-) Range-diff against v6: 1: 9df34a2f4f = 1: ee651fee33 t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` 2: a6dc377f1b = 2: 5d88ad6c90 revision.c: consult Bloom filters for root commits 3: a77ab941bc ! 3: f6cf5bfc4e commit-graph: ensure Bloom filters are read with consistent settings @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm +test_expect_success 'merge graph layers with incompatible Bloom settings' ' + # Ensure that incompatible Bloom filters are ignored when + # merging existing layers. -+ git -C $repo commit-graph write --reachable --changed-paths 2>err && ++ >trace2.txt && ++ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ ++ git -C $repo commit-graph write --reachable --changed-paths 2>err && + grep "disabling Bloom filters for commit-graph layer .$layer." err && ++ grep "{\"hash_version\":1,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt && + + test_path_is_file $repo/$graph && + test_dir_is_empty $repo/$graphdir && 4: 56a9fdaff0 = 4: 0041600f31 gitformat-commit-graph: describe version 2 of BDAT 5: 7484a82f7f = 5: 6e7f317551 t/helper/test-read-graph.c: extract `dump_graph_info()` 6: 48343f93a2 = 6: ae74fbad3e bloom.h: make `load_bloom_filter_from_graph()` public 7: 286fd7dcdb = 7: 0dfd1a361e t/helper/test-read-graph: implement `bloom-filters` mode 8: 7de7b89da0 = 8: fbcaa686b1 t4216: test changed path filters with high bit paths 9: b13c9b8ff9 ! 9: 60c063ca4a repo-settings: introduce commitgraph.changedPathsVersion @@ commit-graph.c: static void validate_mixed_bloom_settings(struct commit_graph *g ## oss-fuzz/fuzz-commit-graph.c ## @@ oss-fuzz/fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) - * possible. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); the_repository->settings.commit_graph_generation_version = 2; - the_repository->settings.commit_graph_read_changed_paths = 1; + the_repository->settings.commit_graph_changed_paths_version = 1; 10: 09c44c51a5 = 10: ce3a15a517 bloom: annotate filters with hash version 11: d4995ef600 = 11: 99ab9cf448 bloom: prepare to discard incompatible Bloom filters 12: c8e9bb7c88 = 12: 99e66d1dba commit-graph: unconditionally load Bloom filters 13: d2f11c082d ! 13: 2e945c3d2e commit-graph: new Bloom filter version that fixes murmur3 @@ commit-graph.c: int write_commit_graph(struct object_directory *odb, + if (r->settings.commit_graph_changed_paths_version < -1 + || r->settings.commit_graph_changed_paths_version > 2) { + warning(_("attempting to write a commit-graph, but " -+ "'commitgraph.changedPathsVersion' (%d) is not supported"), ++ "'commitGraph.changedPathsVersion' (%d) is not supported"), + r->settings.commit_graph_changed_paths_version); + return 0; + } @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible test_must_be_empty err ' ++# chosen to be the same under all Unicode normalization forms ++CENT=$(printf "\302\242") ++ +test_expect_success 'ensure Bloom filter with incompatible versions are ignored' ' + rm "$repo/$graph" && + @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible + cat >expect.err <<-EOF && + warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings + EOF -+ test_cmp expect.err err ++ test_cmp expect.err err && ++ ++ # Merge the two layers with incompatible bloom filter versions, ++ # ensuring that the v2 filters are used. ++ >trace2.txt && ++ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ ++ git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write --reachable --changed-paths 2>err && ++ grep "disabling Bloom filters for commit-graph layer .$layer." err && ++ grep "{\"hash_version\":2,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt +' + get_first_changed_path_filter () { test-tool read-graph bloom-filters >filters.dat && head -n 1 filters.dat + } + +-# chosen to be the same under all Unicode normalization forms +-CENT=$(printf "\302\242") +- + test_expect_success 'set up repo with high bit path, version 1 changed-path' ' + git init highbit1 && + test_commit -C highbit1 c1 "$CENT" && @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' ' ) ' @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers +test_expect_success 'version 1 changed-path not used when version 2 requested' ' + ( + cd highbit1 && -+ git config --add commitgraph.changedPathsVersion 2 && ++ git config --add commitGraph.changedPathsVersion 2 && + test_bloom_filters_not_used "-- another$CENT" + ) +' @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers +test_expect_success 'version 1 changed-path used when autodetect requested' ' + ( + cd highbit1 && -+ git config --add commitgraph.changedPathsVersion -1 && ++ git config --add commitGraph.changedPathsVersion -1 && + test_bloom_filters_used "-- another$CENT" + ) +' @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers + git -C highbit1 commit-graph write --reachable --changed-paths && + ( + cd highbit1 && -+ git config --add commitgraph.changedPathsVersion -1 && ++ git config --add commitGraph.changedPathsVersion -1 && + echo "options: bloom(1,10,7) read_generation_data" >expect && + test-tool read-graph >full && + grep options full >actual && @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers + +test_expect_success 'set up repo with high bit path, version 2 changed-path' ' + git init highbit2 && -+ git -C highbit2 config --add commitgraph.changedPathsVersion 2 && ++ git -C highbit2 config --add commitGraph.changedPathsVersion 2 && + test_commit -C highbit2 c2 "$CENT" && + git -C highbit2 commit-graph write --reachable --changed-paths +' @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers +test_expect_success 'version 2 changed-path not used when version 1 requested' ' + ( + cd highbit2 && -+ git config --add commitgraph.changedPathsVersion 1 && ++ git config --add commitGraph.changedPathsVersion 1 && + test_bloom_filters_not_used "-- another$CENT" + ) +' @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers +test_expect_success 'version 2 changed-path used when autodetect requested' ' + ( + cd highbit2 && -+ git config --add commitgraph.changedPathsVersion -1 && ++ git config --add commitGraph.changedPathsVersion -1 && + test_bloom_filters_used "-- another$CENT" + ) +' @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers + git -C highbit2 commit-graph write --reachable --changed-paths && + ( + cd highbit2 && -+ git config --add commitgraph.changedPathsVersion -1 && ++ git config --add commitGraph.changedPathsVersion -1 && + echo "options: bloom(2,10,7) read_generation_data" >expect && + test-tool read-graph >full && + grep options full >actual && @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers +test_expect_success 'when writing commit graph, do not reuse changed-path of another version' ' + git init doublewrite && + test_commit -C doublewrite c "$CENT" && -+ git -C doublewrite config --add commitgraph.changedPathsVersion 1 && ++ git -C doublewrite config --add commitGraph.changedPathsVersion 1 && + git -C doublewrite commit-graph write --reachable --changed-paths && + for v in -2 3 + do -+ git -C doublewrite config --add commitgraph.changedPathsVersion $v && ++ git -C doublewrite config --add commitGraph.changedPathsVersion $v && + git -C doublewrite commit-graph write --reachable --changed-paths 2>err && + cat >expect <<-EOF && -+ warning: attempting to write a commit-graph, but ${SQ}commitgraph.changedPathsVersion${SQ} ($v) is not supported ++ warning: attempting to write a commit-graph, but ${SQ}commitGraph.changedPathsVersion${SQ} ($v) is not supported + EOF + test_cmp expect err || return 1 + done && -+ git -C doublewrite config --add commitgraph.changedPathsVersion 2 && ++ git -C doublewrite config --add commitGraph.changedPathsVersion 2 && + git -C doublewrite commit-graph write --reachable --changed-paths && + ( + cd doublewrite && 14: 9f54a376fb = 14: 242f023135 object.h: fix mis-aligned flag bits table 15: 67991dea7c ! 15: 1b80023e57 commit-graph: reuse existing Bloom filters where possible @@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter, + struct tree_desc desc; + struct name_entry entry; + -+ init_tree_desc(&desc, t->buffer, t->size); ++ init_tree_desc(&desc, &t->object.oid, t->buffer, t->size); + while (tree_entry(&desc, &entry)) { + size_t i; + for (i = 0; i < entry.pathlen; i++) { @@ t/t4216-log-bloom.sh: test_expect_success 'when writing another commit graph, pr git init doublewrite && test_commit -C doublewrite c "$CENT" && + - git -C doublewrite config --add commitgraph.changedPathsVersion 1 && + git -C doublewrite config --add commitGraph.changedPathsVersion 1 && ++ >trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C doublewrite commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && @@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu test_cmp expect err || return 1 done && + - git -C doublewrite config --add commitgraph.changedPathsVersion 2 && + git -C doublewrite config --add commitGraph.changedPathsVersion 2 && - git -C doublewrite commit-graph write --reachable --changed-paths && ++ >trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C doublewrite commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && @@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu + + test_commit -C upgrade base no-high-bits && + -+ git -C upgrade config --add commitgraph.changedPathsVersion 1 && ++ git -C upgrade config --add commitGraph.changedPathsVersion 1 && ++ >trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C upgrade commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace2.txt && + test_filter_upgraded 0 trace2.txt && + -+ git -C upgrade config --add commitgraph.changedPathsVersion 2 && ++ git -C upgrade config --add commitGraph.changedPathsVersion 2 && ++ >trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C upgrade commit-graph write --reachable --changed-paths && + test_filter_computed 0 trace2.txt && 16: 12058a074d = 16: db9991f339 bloom: introduce `deinit_bloom_filters()` base-commit: 1e1586e4ed626bde864339c10570bc0e73f0ab97 -- 2.45.2.664.g446e6a2b1f