Re: [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Fri, 1 Sep 2023 13:56:15 -0700

SZEDER Gábor <szeder.dev@xxxxxxxxx> writes:
> I'd prefer to leave the test cases unchanged, and make the revision
> walking machinery look at Bloom filters even for root commits, because
> this is an annoying and recurring testing issue.  I remember it
> annoyed me back then, when I wanted to reproduce a couple of bugs that
> I knew were there, but my initial test cases didn't fail because the
> Bloom filter of the root commit was ignored; Derrick overlooked this
> in b16a8277644, you overlooked it now, and none of the reviewers then
> and now caught it, either.

I agree that making the revwalk look at Bloom filters of root commits
is an urgent matter for the reasons you describe (it's something
easily missed that slows down future development and/or makes future
development error-prone), but so is moving to a correct murmur3
implementation, I think, and one shouldn't stop the other. There could
be an argument that because the revwalk doesn't look at root commit
Bloom filters, any development on a new Bloom filter hash version is
suspect, but I don't think that has to be completely true.

> > My original design (up to patch 7 in this patch set) defends against
> > this by taking the very first version detected and rejecting every
> > other version, and Taylor's subsequent design reads every version, but
> > annotates filters with its version. So I think we're covered.
> 
> In the meantime I adapted the test cases from the above linked message
> to write commit-graph layers with different Bloom filter versions, and
> it does fail, because commits from the bottom commit-graph layer are
> omitted from the revision walk's output.  And the test case doesn't
> even need a middle layer without modified path Bloom filters to "hide"
> the different version in the bottom layer.  Merging the layers seems
> to work, though.

For what it's worth, your test case below (with test_expect_success
instead of test_expect_failure) passes with my original design. With the
full patch set, it does fail, but for what it's worth, I did spot this,
providing an incomplete solution [1] and then a complete one [2]. Your
test case passes if I also include the following:

  diff --git a/bloom.c b/bloom.c
  index ff131893cd..1bafd62a4e 100644
  --- a/bloom.c
  +++ b/bloom.c
  @@ -344,6 +344,10 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)

          prepare_repo_settings(r);
          hash_version = r->settings.commit_graph_changed_paths_version;
  +       if (hash_version == -1) {
  +               struct bloom_filter_settings *s = get_bloom_filter_settings(r);
  +               hash_version = (s && s->hash_version == 2) ? 2 : 1;
  +       }

          if (!(hash_version == -1 || hash_version == filter->version))
                  return NULL; /* unusable filter */

[1] https://lore.kernel.org/git/20230824222051.2320003-1-jonathantanmy@xxxxxxxxxx/
[2] https://lore.kernel.org/git/20230829220432.558674-1-jonathantanmy@xxxxxxxxxx/

> Besides fixing this issue, I think that the interaction of different
> Bloom filter versions and split commit-graphs needs to be thoroughly
> covered with test cases and discussed in the commit messages before
> this series could be considered good for merging.

Regarding commit messages, I can see that in "commit-graph: new filter
ver. that fixes murmur3", I could add "(the first version read if there
are multiple versions of changed path filters in the repository)" after
"automatically determined from the version of the existing changed path
filters in the repository". Taylor's patches already inherently cover
multiple versions, I think, since Bloom filters are annotated with their
versions, individually.

Regarding tests, yes, we could add the one you provided.

If you (or anyone else) can spot anything else that should be added,
please let us know.