Here are answers to your questions, but I'll also reply to the latest version stating that I'm OK with this patch set being merged (with my reasons). Taylor Blau <me@xxxxxxxxxxxx> writes: > Apologies for not quite grokking this, but I am still somewhat confused. > > Suppose we applied something like your suggestion on top of this patch, > like so: > > --- 8< --- > diff --git a/bloom.c b/bloom.c > index 739fa093ba..ee81976342 100644 > --- a/bloom.c > +++ b/bloom.c > @@ -253,16 +253,16 @@ static void init_truncated_large_filter(struct bloom_filter *filter, > struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c) > { > struct bloom_filter *filter; > - int hash_version; > + struct bloom_filter_settings *settings; > > filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL); > if (!filter) > return NULL; > > prepare_repo_settings(r); > - hash_version = r->settings.commit_graph_changed_paths_version; > + settings = get_bloom_filter_settings(r); > > - if (!(hash_version == -1 || hash_version == filter->version)) > + if (filter->version != (settings->hash_version == 2 ? 2 : 1)) > return NULL; /* unusable filter */ > return filter; > } > --- >8 --- > > We have a handful of failing tests, notably including t4216.1, which > tries to read settings->hash_version, but fails since settings is NULL. > I believe this happens when we have a computed Bloom filter prepared for > some commit, but have not yet written a commit graph containing that (or > any) Bloom filter. Ah, I discovered this independently too. I think it happens when we write version 2 Bloom filters to a repo that has no Bloom filters currently. So, r->settings.commit_graph_changed_paths_version is set to 2, but settings is NULL. I think the solution has to be a combination of both (use commit_graph_changed_paths_version, but if it is -1, check get_bloom_filter_settings()). But having said that, as I said above, we can go with what you have currently. > But I think we're talking about different things here. The point of > get_bloom_filter() as a wrapper around get_or_compute_bloom_filter() is > to only return Bloom filters that are readable under the configuration > commitGraph.changedPathsVersion. But this is my contention - sometimes commitGraph.changedPathsVersion is -1, so we need to find out which version is readable from elsewhere. > We have a handle on what version was used to compute Bloom filters in > each layer of the graph by reading its "version" field, which is written > via load_bloom_filter_from_graph(). > > So we want to return a non-NULL filter exactly when we (a) have a Bloom > filter computed for the given commit, and (b) its version matches the > version specified in commitGraph.chagnedPathsVersion. Yes, except add "or autodetected from the first commit graph file that we read if nothing is specified in commitGraph.changedPathsVersion" to the end. > Are you saying that we need to do the normalization ahead of time so > that we don't emit filters that have different hash versions when > working in a commit-graph chain where each layer may use different Bloom > filter settings? By "emit" do you mean write filters to disk? If yes, I'm worried about reading them, not writing them - for example, when running "git log" with a path. I am worried precisely about the commit-graph chain in which different layers have different Bloom settings, yes.