SZEDER Gábor <szeder.dev@xxxxxxxxx> writes: > > @@ -66,7 +66,64 @@ int load_bloom_filter_from_graph(struct commit_graph *g, > > * Not considered to be cryptographically secure. > > * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm > > */ > > -uint32_t murmur3_seeded(uint32_t seed, const char *data, size_t len) > > +uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len) > > Nit: MurmurHash3 implementations in C/C++ (apart from the sample > implementation on Wikipedia), and other hash functions taking data > pointer and buffer size parameters in general, have a 'const void > *data' parameter, not 'const char*'. I think either works, so I'll stick with what the existing code uses. (Well, we probably should have used "unsigned char" in the first place.) > > +test_expect_success 'when writing another commit graph, preserve existing version 1 of changed-path' ' > > + test_commit -C highbit1 c1double "$CENT$CENT" && > > + git -C highbit1 commit-graph write --reachable --changed-paths && > > Nit: Since there is a subshell cd-ing into the 'highbit1' directory > anyway, it would look clearer to put these two commands into that > subshell as well. > This applies to some of the later test cases as well. Makes sense, but this patch series has already been reviewed a few times so I don't think it's worth making this change at this point in the review process for a small increase in readability. > > +test_expect_success 'version 2 changed-path used when version 2 requested' ' > > + ( > > + cd highbit2 && > > + test_bloom_filters_used "-- $CENT" > > This test_bloom_filter_used helper runs two pathspec-limited 'git log' > invocations, one with disabled and the other with enabled > commit-graph, and thus with disabled and enabled modified path Bloom > filters, and compares their output. > > One of the flaws of the current modified path Bloom filters > implementation is that it doesn't check Bloom filters for root > commits. > > In several of the above test cases test_bloom_filters_used is invoked > in a repository with only a root commit, so they don't check that > the output is the same with and without Bloom filters. Ah...you are right. Indeed, when I flip one of the tests from test_bloom_filters_used to _not_, the test still passes. I'll change the tests. > The string "split" occurs twice in this patch series, but only in > patch hunk contexts, and it doesn't occur at all in the previous long > thread about the original patch series. > > Unfortunately, split commit-graphs weren't really considered in the > design of the modified path Bloom filters feature, and layers with > different Bloom filter settings weren't considered at all. I've > reported it back then, but the fixes so far were incomplete, and e.g. > the test cases shown in > > https://public-inbox.org/git/20201015132147.GB24954@xxxxxxxxxx/ > > still fail. > > Since the interaction of different versions and split commit-graphs > was neither mentioned in any of the commit messages nor discussed > during the previous rounds, and there isn't any test case excercising > it, and since the Bloom filter version information is stored in the > same 'g->bloom_filter_settings' structure as the number of hashes, I'm > afraid (though haven't actually checked) that handling commit-graph > layers with different Bloom filter versions is prone to the same > issues as well. My original design (up to patch 7 in this patch set) defends against this by taking the very first version detected and rejecting every other version, and Taylor's subsequent design reads every version, but annotates filters with its version. So I think we're covered.