On Thu, Jul 27, 2023 at 10:39:05AM -0700, Jonathan Tan wrote: > A test should be doable - we already have tests (the ones that use > "get_first_changed_path_filter") that check the bytes of the filter > generated, so we should be able to write a test that writes one version, > writes the other, then checks the bytes. Thanks for looking into it! > > > So I think we want to be more careful about when we load the existing > > > Bloom data or not. I think we probably want to load it in all cases, but > > > only read it when we have compatible Bloom settings. That should stop us > > > from exhibiting this kind of bug, but it also gives us a handle on > > > existing Bloom data if we wanted to copy forward existing results where > > > we can. > > The intention in the current patch set was to not load it at all when we > have incompatible Bloom settings because it appeared quite troublesome > to notate which Bloom filter in memory is of which version. If we want > to copy forward existing results, we can change that, but I don't know > whether it's worth doing that (and if we think it's worth doing, this > should probably go in another patch set). Yeah, I think having Bloom filters accessible from a commit-graph regardless of whether or not it matches our Bloom filter version is prerequisite to being able to implement something like this. I feel like this is important enough to do in the same patch set, or the same release to avoid surprising operators when their commit-graph write suddenly recomputes all of its Bloom filters. Since we already store the Bloom version that we're using in the commit-graph file itself, shouldn't it be something along the lines of sticking that value onto the bloom_filter when we read its contents? Although I don't think that you'd even need to annotate each individual filter, since you know that every pre-existing Bloom filter you are able to find has the version given by: the_repository->objects->commit_graph->bloom_filter_settings->hash_version right? Thanks, Taylor