Re: [PATCH v6 0/7] Changed path filter hash fix and version bump

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 27 Jul 2023 13:56:00 -0400

On Thu, Jul 27, 2023 at 10:39:05AM -0700, Jonathan Tan wrote:
> A test should be doable - we already have tests (the ones that use
> "get_first_changed_path_filter") that check the bytes of the filter
> generated, so we should be able to write a test that writes one version,
> writes the other, then checks the bytes.

Thanks for looking into it!

> > > So I think we want to be more careful about when we load the existing
> > > Bloom data or not. I think we probably want to load it in all cases, but
> > > only read it when we have compatible Bloom settings. That should stop us
> > > from exhibiting this kind of bug, but it also gives us a handle on
> > > existing Bloom data if we wanted to copy forward existing results where
> > > we can.
>
> The intention in the current patch set was to not load it at all when we
> have incompatible Bloom settings because it appeared quite troublesome
> to notate which Bloom filter in memory is of which version. If we want
> to copy forward existing results, we can change that, but I don't know
> whether it's worth doing that (and if we think it's worth doing, this
> should probably go in another patch set).

Yeah, I think having Bloom filters accessible from a commit-graph
regardless of whether or not it matches our Bloom filter version is
prerequisite to being able to implement something like this.

I feel like this is important enough to do in the same patch set, or the
same release to avoid surprising operators when their commit-graph write
suddenly recomputes all of its Bloom filters.

Since we already store the Bloom version that we're using in the
commit-graph file itself, shouldn't it be something along the lines of
sticking that value onto the bloom_filter when we read its contents?

Although I don't think that you'd even need to annotate each individual
filter, since you know that every pre-existing Bloom filter you are able
to find has the version given by:

    the_repository->objects->commit_graph->bloom_filter_settings->hash_version

right?

Thanks,
Taylor