Re: [PATCH v6 0/7] Changed path filter hash fix and version bump

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 26 Jul 2023 17:49:01 -0700

Taylor Blau <me@xxxxxxxxxxxx> writes:

> Sorry for not getting to this sooner. I didn't notice anything during my
> review, but I think there may be a bug here.
>
> Suppose that we have an existing commit-graph with v1 Bloom filters. If
> we then try to rewrite that commit-graph using v2 Bloom filters, we
> *should* attempt to recompute the filter from scratch. But AFAICT, that
> isn't what happens.
> ...
> If I'm parsing this all correctly, Git used the v1 filter corresponding
> to that commit, and did not recompute a new one.
>
> I think that this could lead to incorrect results if you use this to
> masquerade a v1 Bloom filter as a v2 one.

That indeed is very bad.  How hard it would be to construct a test
case that fails if filter computed as v1 is marketed as v2?  A test
may be far easier to construct if it does not have to be end-to-end
(e.g. instrument the codepath you followed through with the debugger
with trace2 annotations and see the control takes the right branches
by reading the log).

> So I think we want to be more careful about when we load the existing
> Bloom data or not. I think we probably want to load it in all cases, but
> only read it when we have compatible Bloom settings. That should stop us
> from exhibiting this kind of bug, but it also gives us a handle on
> existing Bloom data if we wanted to copy forward existing results where
> we can.
>
> If all of this tracks, I think that there is a gap in our test coverage
> that didn't catch this earlier.

Yeah, thanks for raising a concern.

Jonathan?