Re: [PATCH v6 0/7] Changed path filter hash fix and version bump

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Thu, 27 Jul 2023 10:39:05 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
> > Suppose that we have an existing commit-graph with v1 Bloom filters. If
> > we then try to rewrite that commit-graph using v2 Bloom filters, we
> > *should* attempt to recompute the filter from scratch. But AFAICT, that
> > isn't what happens.
> > ...
> > If I'm parsing this all correctly, Git used the v1 filter corresponding
> > to that commit, and did not recompute a new one.
> >
> > I think that this could lead to incorrect results if you use this to
> > masquerade a v1 Bloom filter as a v2 one.
> 
> That indeed is very bad.  How hard it would be to construct a test
> case that fails if filter computed as v1 is marketed as v2?  A test
> may be far easier to construct if it does not have to be end-to-end
> (e.g. instrument the codepath you followed through with the debugger
> with trace2 annotations and see the control takes the right branches
> by reading the log).

Ah, thanks, Taylor, for so meticulously investigating this. I'll take a look.

A test should be doable - we already have tests (the ones that use
"get_first_changed_path_filter") that check the bytes of the filter
generated, so we should be able to write a test that writes one version,
writes the other, then checks the bytes.

> > So I think we want to be more careful about when we load the existing
> > Bloom data or not. I think we probably want to load it in all cases, but
> > only read it when we have compatible Bloom settings. That should stop us
> > from exhibiting this kind of bug, but it also gives us a handle on
> > existing Bloom data if we wanted to copy forward existing results where
> > we can.

The intention in the current patch set was to not load it at all when we
have incompatible Bloom settings because it appeared quite troublesome
to notate which Bloom filter in memory is of which version. If we want
to copy forward existing results, we can change that, but I don't know
whether it's worth doing that (and if we think it's worth doing, this
should probably go in another patch set).

> > If all of this tracks, I think that there is a gap in our test coverage
> > that didn't catch this earlier.
> 
> Yeah, thanks for raising a concern.
> 
> Jonathan?

I'll take a look. Yes this does seem like a gap in test coverage -
I thought the existing test that checks that Bloom filters are not
used when a different version is requested would be sufficient, but
apparently that's not the case.