Thanks Ramsay for spotting the errors and mentioning that I can use octal escapes. Here's an update taking into account their comments. Jonathan Tan (4): gitformat-commit-graph: describe version 2 of BDAT t4216: test changed path filters with high bit paths repo-settings: introduce commitgraph.changedPathsVersion commit-graph: new filter ver. that fixes murmur3 Documentation/config/commitgraph.txt | 16 ++++- Documentation/gitformat-commit-graph.txt | 9 ++- bloom.c | 65 +++++++++++++++++++- bloom.h | 8 ++- commit-graph.c | 29 +++++++-- oss-fuzz/fuzz-commit-graph.c | 2 +- repo-settings.c | 6 +- repository.h | 2 +- t/helper/test-bloom.c | 9 ++- t/t0095-bloom.sh | 8 +++ t/t4216-log-bloom.sh | 77 ++++++++++++++++++++++++ 11 files changed, 210 insertions(+), 21 deletions(-) Range-diff against v3: 1: d4b63945f6 ! 1: a3b52af4c9 gitformat-commit-graph: describe version 2 of BDAT @@ Documentation/gitformat-commit-graph.txt: All multi-byte numbers are in network described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters - in Probabilistic Verification" + in Probabilistic Verification". Version 1 bloom filters have a bug that appears -+ when int is signed and the repository has path names that have characters >= ++ when char is signed and the repository has path names that have characters >= + 0x80; Git supports reading and writing them, but this ability will be removed + in a future version of Git. - The number of times a path is hashed and hence the number of bit positions 2: aa4535776e ! 2: f095e2b486 t4216: test changed path filters with high bit paths @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm +} + +# chosen to be the same under all Unicode normalization forms -+CENT=$(printf "\xc2\xa2") ++CENT=$(printf "\302\242") + -+# Some systems (in particular, Linux on the CI running on GitHub at the time of -+# writing) store into CENT a literal backslash, then "x", and so on (instead of -+# the high-bit characters needed). In these systems, do not run the following -+# tests. -+if test "$(printf $CENT | perl -0777 -ne 'no utf8; print ord($_)')" = "194" -+then -+ test_set_prereq HIGH_BIT -+fi -+ -+test_expect_success HIGH_BIT 'set up repo with high bit path, version 1 changed-path' ' ++test_expect_success 'set up repo with high bit path, version 1 changed-path' ' + git init highbit1 && + test_commit -C highbit1 c1 "$CENT" && + git -C highbit1 commit-graph write --reachable --changed-paths +' + -+test_expect_success HIGH_BIT 'setup check value of version 1 changed-path' ' ++test_expect_success 'setup check value of version 1 changed-path' ' + (cd highbit1 && + printf "52a9" >expect && + get_first_changed_path_filter >actual) +' + -+# expect will not match actual if int is unsigned by default. Write the test ++# expect will not match actual if char is unsigned by default. Write the test +# in this way, so that a user running this test script can still see if the two +# files match. (It will appear as an ordinary success if they match, and a skip +# if not.) +if test_cmp highbit1/expect highbit1/actual +then -+ test_set_prereq SIGNED_INT_BY_DEFAULT ++ test_set_prereq SIGNED_CHAR_BY_DEFAULT +fi -+test_expect_success SIGNED_INT_BY_DEFAULT 'check value of version 1 changed-path' ' ++test_expect_success SIGNED_CHAR_BY_DEFAULT 'check value of version 1 changed-path' ' + # Only the prereq matters for this test. + true +' + -+test_expect_success HIGH_BIT 'version 1 changed-path used when version 1 requested' ' ++test_expect_success 'version 1 changed-path used when version 1 requested' ' + (cd highbit1 && + test_bloom_filters_used "-- $CENT") +' 3: d6982268a4 = 3: 6adfa53daf repo-settings: introduce commitgraph.changedPathsVersion 4: e879483c42 ! 4: 5c65bf8a22 commit-graph: new filter ver. that fixes murmur3 @@ t/t0095-bloom.sh: test_expect_success 'compute unseeded murmur3 hash for test st Hashes:0x5615800c|0x5b966560|0x61174ab4|0x66983008|0x6c19155c|0x7199fab0|0x771ae004| ## t/t4216-log-bloom.sh ## -@@ t/t4216-log-bloom.sh: test_expect_success HIGH_BIT 'version 1 changed-path used when version 1 request +@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' ' test_bloom_filters_used "-- $CENT") ' -+test_expect_success HIGH_BIT 'version 1 changed-path not used when version 2 requested' ' ++test_expect_success 'version 1 changed-path not used when version 2 requested' ' + (cd highbit1 && + git config --add commitgraph.changedPathsVersion 2 && + test_bloom_filters_not_used "-- $CENT") +' + -+test_expect_success HIGH_BIT 'set up repo with high bit path, version 2 changed-path' ' ++test_expect_success 'set up repo with high bit path, version 2 changed-path' ' + git init highbit2 && + git -C highbit2 config --add commitgraph.changedPathsVersion 2 && + test_commit -C highbit2 c2 "$CENT" && + git -C highbit2 commit-graph write --reachable --changed-paths +' + -+test_expect_success HIGH_BIT 'check value of version 2 changed-path' ' ++test_expect_success 'check value of version 2 changed-path' ' + (cd highbit2 && + printf "c01f" >expect && + get_first_changed_path_filter >actual && + test_cmp expect actual) +' + -+test_expect_success HIGH_BIT 'version 2 changed-path used when version 2 requested' ' ++test_expect_success 'version 2 changed-path used when version 2 requested' ' + (cd highbit2 && + test_bloom_filters_used "-- $CENT") +' + -+test_expect_success HIGH_BIT 'version 2 changed-path not used when version 1 requested' ' ++test_expect_success 'version 2 changed-path not used when version 1 requested' ' + (cd highbit2 && + git config --add commitgraph.changedPathsVersion 1 && + test_bloom_filters_not_used "-- $CENT") -- 2.41.0.162.gfafddb0af9-goog