On Tue, Oct 10, 2023 at 04:33:17PM -0400, Taylor Blau wrote: > (Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth > batch, 2023-10-04), at the time of writing). > > This series is a reroll of the combined efforts of [1] and [2] to > introduce the v2 changed-path Bloom filters, which fixes a bug in our > existing implementation of murmur3 paths with non-ASCII characters (when > the "char" type is signed). > > In large part, this is the same as the previous round. But this round > includes some extra bits that address issues pointed out by SZEDER > Gábor, which are: > > - not reading Bloom filters for root commits > - corrupting Bloom filter reads by tweaking the filter settings > between layers. > > These issues were discussed in (among other places) [3], and [4], > respectively. > > Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in > assembling these patches. As usual, a range-diff is included below. > Thanks in advance for your > review! As this patch series has been sitting around without reviews for a week I've tried my best to give it a go. Note though that this area is mostly outside of my own comfort zone, so some of the questions and suggestions might ultimately not apply. Patrick > [1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@xxxxxxxxxx/ > [2]: https://lore.kernel.org/git/cover.1691426160.git.me@xxxxxxxxxxxx/ > [3]: https://public-inbox.org/git/20201015132147.GB24954@xxxxxxxxxx/ > [4]: https://lore.kernel.org/git/20230830200218.GA5147@xxxxxxxxxx/ > > Jonathan Tan (4): > gitformat-commit-graph: describe version 2 of BDAT > t4216: test changed path filters with high bit paths > repo-settings: introduce commitgraph.changedPathsVersion > commit-graph: new filter ver. that fixes murmur3 > > Taylor Blau (13): > t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` > revision.c: consult Bloom filters for root commits > commit-graph: ensure Bloom filters are read with consistent settings > t/helper/test-read-graph.c: extract `dump_graph_info()` > bloom.h: make `load_bloom_filter_from_graph()` public > t/helper/test-read-graph: implement `bloom-filters` mode > bloom: annotate filters with hash version > bloom: prepare to discard incompatible Bloom filters > commit-graph.c: unconditionally load Bloom filters > commit-graph: drop unnecessary `graph_read_bloom_data_context` > object.h: fix mis-aligned flag bits table > commit-graph: reuse existing Bloom filters where possible > bloom: introduce `deinit_bloom_filters()` > > Documentation/config/commitgraph.txt | 26 ++- > Documentation/gitformat-commit-graph.txt | 9 +- > bloom.c | 208 ++++++++++++++++- > bloom.h | 38 +++- > commit-graph.c | 61 ++++- > object.h | 3 +- > oss-fuzz/fuzz-commit-graph.c | 2 +- > repo-settings.c | 6 +- > repository.h | 2 +- > revision.c | 26 ++- > t/helper/test-bloom.c | 9 +- > t/helper/test-read-graph.c | 67 ++++-- > t/t0095-bloom.sh | 8 + > t/t4216-log-bloom.sh | 272 ++++++++++++++++++++++- > 14 files changed, 682 insertions(+), 55 deletions(-) > > Range-diff against v2: > 10: 002a06d1e9 ! 1: fe671d616c t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` > @@ Commit message > indicating that no filters were used. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## t/t4216-log-bloom.sh ## > @@ t/t4216-log-bloom.sh: test_bloom_filters_used () { > -: ---------- > 2: 7d0fa93543 revision.c: consult Bloom filters for root commits > -: ---------- > 3: 2ecc0a2d58 commit-graph: ensure Bloom filters are read with consistent settings > 1: 5fa681b58e ! 4: 17703ed89a gitformat-commit-graph: describe version 2 of BDAT > @@ Commit message > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## Documentation/gitformat-commit-graph.txt ## > @@ Documentation/gitformat-commit-graph.txt: All multi-byte numbers are in network byte order. > 2: 623d840575 ! 5: 94552abf45 t/helper/test-read-graph.c: extract `dump_graph_info()` > @@ Commit message > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## t/helper/test-read-graph.c ## > @@ > 3: bc9d77ae60 ! 6: 3d81efa27b bloom.h: make `load_bloom_filter_from_graph()` public > @@ Commit message > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## bloom.c ## > @@ bloom.c: static inline unsigned char get_bitmask(uint32_t pos) > 4: ac7008aed3 ! 7: d23cd89037 t/helper/test-read-graph: implement `bloom-filters` mode > @@ Commit message > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## t/helper/test-read-graph.c ## > @@ t/helper/test-read-graph.c: static void dump_graph_info(struct commit_graph *graph) > @@ t/helper/test-read-graph.c: int cmd__read_graph(int argc UNUSED, const char **ar > - return 0; > + return ret; > } > ++ > ++ > 5: 71755ba856 ! 8: cba766f224 t4216: test changed path filters with high bit paths > @@ Commit message > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## t/t4216-log-bloom.sh ## > -@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty commits' ' > - ) > +@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' ' > + ! grep "disabling Bloom filters" err > ' > > +get_first_changed_path_filter () { > 6: 9768d92c0f ! 9: a08a961f41 repo-settings: introduce commitgraph.changedPathsVersion > @@ Commit message > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## Documentation/config/commitgraph.txt ## > @@ Documentation/config/commitgraph.txt: commitGraph.maxNewFilters:: > 7: f911b4bfab = 10: 61d44519a5 commit-graph: new filter ver. that fixes murmur3 > 8: 35009900df ! 11: a8c10f8de8 bloom: annotate filters with hash version > @@ Commit message > Bloom filter. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## bloom.c ## > @@ bloom.c: int load_bloom_filter_from_graph(struct commit_graph *g, > 9: 138bc16905 ! 12: 2ba10a4b4b bloom: prepare to discard incompatible Bloom filters > @@ Commit message > `get_or_compute_bloom_filter()`. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## bloom.c ## > @@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter, > 11: 2437e62813 ! 13: 09d8669c3a commit-graph.c: unconditionally load Bloom filters > @@ Commit message > either "1" or "2". > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## commit-graph.c ## > @@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start, > 12: fe8fb2f5fe ! 14: 0d4f9dc4ee commit-graph: drop unnecessary `graph_read_bloom_data_context` > @@ Commit message > > Noticed-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## commit-graph.c ## > @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_start, > 13: 825af91e11 ! 15: 1f7f27bc47 object.h: fix mis-aligned flag bits table > @@ Commit message > Bit position 23 is one column too far to the left. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## object.h ## > @@ object.h: void object_array_init(struct object_array *array); > 14: 593b317192 ! 16: abbef95ae8 commit-graph: reuse existing Bloom filters where possible > @@ Commit message > commits by their generation number. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > - Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > - Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > > ## bloom.c ## > @@ > 15: 8bf2c9cf98 = 17: ca362408d5 bloom: introduce `deinit_bloom_filters()` > -- > 2.42.0.342.g8bb3a896ee
Attachment:
signature.asc
Description: PGP signature