Re: [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3

Taylor Blau <me@xxxxxxxxxxxx> · Mon, 25 Sep 2023 19:03:18 -0400

On Fri, Sep 01, 2023 at 01:56:15PM -0700, Jonathan Tan wrote:
> > > My original design (up to patch 7 in this patch set) defends against
> > > this by taking the very first version detected and rejecting every
> > > other version, and Taylor's subsequent design reads every version, but
> > > annotates filters with its version. So I think we're covered.
> >
> > In the meantime I adapted the test cases from the above linked message
> > to write commit-graph layers with different Bloom filter versions, and
> > it does fail, because commits from the bottom commit-graph layer are
> > omitted from the revision walk's output.  And the test case doesn't
> > even need a middle layer without modified path Bloom filters to "hide"
> > the different version in the bottom layer.  Merging the layers seems
> > to work, though.
>
> For what it's worth, your test case below (with test_expect_success
> instead of test_expect_failure) passes with my original design. With the
> full patch set, it does fail, but for what it's worth, I did spot this,
> providing an incomplete solution [1] and then a complete one [2]. Your
> test case passes if I also include the following:
>
>   diff --git a/bloom.c b/bloom.c
>   index ff131893cd..1bafd62a4e 100644
>   --- a/bloom.c
>   +++ b/bloom.c
>   @@ -344,6 +344,10 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
>
>           prepare_repo_settings(r);
>           hash_version = r->settings.commit_graph_changed_paths_version;
>   +       if (hash_version == -1) {
>   +               struct bloom_filter_settings *s = get_bloom_filter_settings(r);
>   +               hash_version = (s && s->hash_version == 2) ? 2 : 1;
>   +       }
>
>           if (!(hash_version == -1 || hash_version == filter->version))
>                   return NULL; /* unusable filter */
>
> [1] https://lore.kernel.org/git/20230824222051.2320003-1-jonathantanmy@xxxxxxxxxx/
> [2] https://lore.kernel.org/git/20230829220432.558674-1-jonathantanmy@xxxxxxxxxx/

Hmm. I am confused -- are you saying that this series breaks existing
functionality, or merely does not patch an existing breakage? I *think*
that it's the latter, since this test case fails identically on master,
but I am not sure.

If my understanding is correct, I think that patching this would involve
annotating each Bloom filter with a pointer to the bloom_filter_settings
it was written with, and then using those settings when querying it.

Thanks,
Taylor