Re: [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau <me@xxxxxxxxxxxx> writes:
> On Thu, Aug 24, 2023 at 04:05:27PM -0700, Jonathan Tan wrote:
> > Up to here is fine.
> >
> > > +	hash_version = r->settings.commit_graph_changed_paths_version;
> >
> > Instead of doing this, do this (well, move the struct declaration to
> > the top):
> >
> >   struct bloom_filter_settings *s = get_bloom_filter_settings(r);
> >   hash_version = s->hash_version == 2 ? 2 : 1;
> 
> Do we need this normalization? We assign s->hash_version in
> commit-graph.c::graph_read_bloom_data() by reading it from the start of
> the BDAT chunk, so this should only ever be 1 or 2.

I'm not sure offhand if we do...I wrote it this way to match
fill_bloom_key(), but fill_bloom_key() was written in that way because
it was the clearest, not specifically because it needed to normalize.

> > > +	if (!(hash_version == -1 || hash_version == filter->version))
> >
> > No need for the comparison to -1 here.
> 
> I'm not sure I understand your suggestion. When we fetch the filter from
> get_or_compute_bloom_filter(), we have filter->version set to the
> hash_version from the containing graph's Bloom filter settings.
> 
> So (besides the normalization), I would think that:
> 
>     struct bloom_filter_settings *s = get_bloom_filter_settings(r);
>     struct bloom_filter *f = get_bloom_filter(...);
> 
>     assert(s->hash_version == f->version);
> 
> would hold.

My mention to avoid the comparison to -1 was just for completeness
- since we're normalizing the value of hash_version to 1 or 2, we no
longer need to compare it to -1.

As for whether s->hash_version is always equal to f->version, I think
that it may not be true if for some reason, there are multiple commit
graph files on disk, not all with the same Bloom filter version.

> I think the check that we want to make is instead: is this Bloom
> filter's version (or equivalently, the hash version indicated by that
> graph's BDAT chunk) something that we can read? 

I think it's not "something that we can read" (eventually, we can read
all versions, we just treat them differently) but "the version that
fill_bloom_key" will use. We don't want this function to produce a Bloom
filter of version X and then have the calling code subsequently use it
with a Bloom key of version Y.

> And I think "what we can
> read" here is dictated by the commit_graph_changed_paths_version member
> of our repository_settings, no?

I don't think commit_graph_changed_paths_version always dictates
something - it could be -1 (which you have probably seen, since you
check it against -1 in the current version of the patch). One of the
points of my suggestion is to avoid this field completely.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux