On Tue, Aug 29, 2023 at 09:49:26AM -0700, Jonathan Tan wrote: > > > > + if (!(hash_version == -1 || hash_version == filter->version)) > > > > > > No need for the comparison to -1 here. > > > > I'm not sure I understand your suggestion. When we fetch the filter from > > get_or_compute_bloom_filter(), we have filter->version set to the > > hash_version from the containing graph's Bloom filter settings. > > > > So (besides the normalization), I would think that: > > > > struct bloom_filter_settings *s = get_bloom_filter_settings(r); > > struct bloom_filter *f = get_bloom_filter(...); > > > > assert(s->hash_version == f->version); > > > > would hold. > > My mention to avoid the comparison to -1 was just for completeness > - since we're normalizing the value of hash_version to 1 or 2, we no > longer need to compare it to -1. > > As for whether s->hash_version is always equal to f->version, I think > that it may not be true if for some reason, there are multiple commit > graph files on disk, not all with the same Bloom filter version. Apologies for not quite grokking this, but I am still somewhat confused. Suppose we applied something like your suggestion on top of this patch, like so: --- 8< --- diff --git a/bloom.c b/bloom.c index 739fa093ba..ee81976342 100644 --- a/bloom.c +++ b/bloom.c @@ -253,16 +253,16 @@ static void init_truncated_large_filter(struct bloom_filter *filter, struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) { struct bloom_filter *filter; - int hash_version; + struct bloom_filter_settings *settings; filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL); if (!filter) return NULL; prepare_repo_settings(r); - hash_version = r->settings.commit_graph_changed_paths_version; + settings = get_bloom_filter_settings(r); - if (!(hash_version == -1 || hash_version == filter->version)) + if (filter->version != (settings->hash_version == 2 ? 2 : 1)) return NULL; /* unusable filter */ return filter; } --- >8 --- We have a handful of failing tests, notably including t4216.1, which tries to read settings->hash_version, but fails since settings is NULL. I believe this happens when we have a computed Bloom filter prepared for some commit, but have not yet written a commit graph containing that (or any) Bloom filter. But I think we're talking about different things here. The point of get_bloom_filter() as a wrapper around get_or_compute_bloom_filter() is to only return Bloom filters that are readable under the configuration commitGraph.changedPathsVersion. We have a handle on what version was used to compute Bloom filters in each layer of the graph by reading its "version" field, which is written via load_bloom_filter_from_graph(). So we want to return a non-NULL filter exactly when we (a) have a Bloom filter computed for the given commit, and (b) its version matches the version specified in commitGraph.chagnedPathsVersion. Are you saying that we need to do the normalization ahead of time so that we don't emit filters that have different hash versions when working in a commit-graph chain where each layer may use different Bloom filter settings? If so, then I think the change we'd need to make is limited to: --- 8< --- diff --git a/bloom.c b/bloom.c index 739fa093ba..d5cc23b0a8 100644 --- a/bloom.c +++ b/bloom.c @@ -260,9 +260,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) return NULL; prepare_repo_settings(r); - hash_version = r->settings.commit_graph_changed_paths_version; + hash_version = r->settings.commit_graph_changed_paths_version == 2 ? 2 : 1; - if (!(hash_version == -1 || hash_version == filter->version)) + if (hash_version == filter->version) return NULL; /* unusable filter */ return filter; } --- >8 --- But that doesn't quite work, either, since it's breaking some assumption from the caller and causing us not to write out any Bloom filters (I haven't investigated further, but I'm not sure that this is the right path in the first place...). So I am not sure what you are suggesting, but I am sure that I'm missing something. Thanks, Taylor