Re: [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here are answers to your questions, but I'll also reply to the latest
version stating that I'm OK with this patch set being merged (with my
reasons).

Taylor Blau <me@xxxxxxxxxxxx> writes:
> Apologies for not quite grokking this, but I am still somewhat confused.
> 
> Suppose we applied something like your suggestion on top of this patch,
> like so:
> 
> --- 8< ---
> diff --git a/bloom.c b/bloom.c
> index 739fa093ba..ee81976342 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -253,16 +253,16 @@ static void init_truncated_large_filter(struct bloom_filter *filter,
>  struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
>  {
>  	struct bloom_filter *filter;
> -	int hash_version;
> +	struct bloom_filter_settings *settings;
> 
>  	filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL);
>  	if (!filter)
>  		return NULL;
> 
>  	prepare_repo_settings(r);
> -	hash_version = r->settings.commit_graph_changed_paths_version;
> +	settings = get_bloom_filter_settings(r);
> 
> -	if (!(hash_version == -1 || hash_version == filter->version))
> +	if (filter->version != (settings->hash_version == 2 ? 2 : 1))
>  		return NULL; /* unusable filter */
>  	return filter;
>  }
> --- >8 ---
> 
> We have a handful of failing tests, notably including t4216.1, which
> tries to read settings->hash_version, but fails since settings is NULL.
> I believe this happens when we have a computed Bloom filter prepared for
> some commit, but have not yet written a commit graph containing that (or
> any) Bloom filter.

Ah, I discovered this independently too. I think it happens when we
write version 2 Bloom filters to a repo that has no Bloom filters
currently. So, r->settings.commit_graph_changed_paths_version is set to
2, but settings is NULL. I think the solution has to be a combination
of both (use commit_graph_changed_paths_version, but if it is -1, check
get_bloom_filter_settings()). But having said that, as I said above, we
can go with what you have currently.

> But I think we're talking about different things here. The point of
> get_bloom_filter() as a wrapper around get_or_compute_bloom_filter() is
> to only return Bloom filters that are readable under the configuration
> commitGraph.changedPathsVersion.

But this is my contention - sometimes commitGraph.changedPathsVersion is
-1, so we need to find out which version is readable from elsewhere.

> We have a handle on what version was used to compute Bloom filters in
> each layer of the graph by reading its "version" field, which is written
> via load_bloom_filter_from_graph().
> 
> So we want to return a non-NULL filter exactly when we (a) have a Bloom
> filter computed for the given commit, and (b) its version matches the
> version specified in commitGraph.chagnedPathsVersion.

Yes, except add "or autodetected from the first commit graph file that
we read if nothing is specified in commitGraph.changedPathsVersion" to
the end.

> Are you saying that we need to do the normalization ahead of time so
> that we don't emit filters that have different hash versions when
> working in a commit-graph chain where each layer may use different Bloom
> filter settings? 

By "emit" do you mean write filters to disk? If yes, I'm worried about
reading them, not writing them - for example, when running "git log"
with a path. I am worried precisely about the commit-graph chain in
which different layers have different Bloom settings, yes.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux