Re: [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 29, 2023 at 09:49:26AM -0700, Jonathan Tan wrote:
> > > > +	if (!(hash_version == -1 || hash_version == filter->version))
> > >
> > > No need for the comparison to -1 here.
> >
> > I'm not sure I understand your suggestion. When we fetch the filter from
> > get_or_compute_bloom_filter(), we have filter->version set to the
> > hash_version from the containing graph's Bloom filter settings.
> >
> > So (besides the normalization), I would think that:
> >
> >     struct bloom_filter_settings *s = get_bloom_filter_settings(r);
> >     struct bloom_filter *f = get_bloom_filter(...);
> >
> >     assert(s->hash_version == f->version);
> >
> > would hold.
>
> My mention to avoid the comparison to -1 was just for completeness
> - since we're normalizing the value of hash_version to 1 or 2, we no
> longer need to compare it to -1.
>
> As for whether s->hash_version is always equal to f->version, I think
> that it may not be true if for some reason, there are multiple commit
> graph files on disk, not all with the same Bloom filter version.

Apologies for not quite grokking this, but I am still somewhat confused.

Suppose we applied something like your suggestion on top of this patch,
like so:

--- 8< ---
diff --git a/bloom.c b/bloom.c
index 739fa093ba..ee81976342 100644
--- a/bloom.c
+++ b/bloom.c
@@ -253,16 +253,16 @@ static void init_truncated_large_filter(struct bloom_filter *filter,
 struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
 {
 	struct bloom_filter *filter;
-	int hash_version;
+	struct bloom_filter_settings *settings;

 	filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL);
 	if (!filter)
 		return NULL;

 	prepare_repo_settings(r);
-	hash_version = r->settings.commit_graph_changed_paths_version;
+	settings = get_bloom_filter_settings(r);

-	if (!(hash_version == -1 || hash_version == filter->version))
+	if (filter->version != (settings->hash_version == 2 ? 2 : 1))
 		return NULL; /* unusable filter */
 	return filter;
 }
--- >8 ---

We have a handful of failing tests, notably including t4216.1, which
tries to read settings->hash_version, but fails since settings is NULL.
I believe this happens when we have a computed Bloom filter prepared for
some commit, but have not yet written a commit graph containing that (or
any) Bloom filter.

But I think we're talking about different things here. The point of
get_bloom_filter() as a wrapper around get_or_compute_bloom_filter() is
to only return Bloom filters that are readable under the configuration
commitGraph.changedPathsVersion.

We have a handle on what version was used to compute Bloom filters in
each layer of the graph by reading its "version" field, which is written
via load_bloom_filter_from_graph().

So we want to return a non-NULL filter exactly when we (a) have a Bloom
filter computed for the given commit, and (b) its version matches the
version specified in commitGraph.chagnedPathsVersion.

Are you saying that we need to do the normalization ahead of time so
that we don't emit filters that have different hash versions when
working in a commit-graph chain where each layer may use different Bloom
filter settings? If so, then I think the change we'd need to make is
limited to:

--- 8< ---
diff --git a/bloom.c b/bloom.c
index 739fa093ba..d5cc23b0a8 100644
--- a/bloom.c
+++ b/bloom.c
@@ -260,9 +260,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
 		return NULL;

 	prepare_repo_settings(r);
-	hash_version = r->settings.commit_graph_changed_paths_version;
+	hash_version = r->settings.commit_graph_changed_paths_version == 2 ? 2 : 1;

-	if (!(hash_version == -1 || hash_version == filter->version))
+	if (hash_version == filter->version)
 		return NULL; /* unusable filter */
 	return filter;
 }
--- >8 ---

But that doesn't quite work, either, since it's breaking some assumption
from the caller and causing us not to write out any Bloom filters (I
haven't investigated further, but I'm not sure that this is the right
path in the first place...).

So I am not sure what you are suggesting, but I am sure that I'm missing
something.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux