Re: [PATCH v4 07/14] bloom: split 'get_bloom_filter()' in two

Taylor Blau <me@xxxxxxxxxxxx> · Sat, 5 Sep 2020 14:01:20 -0400

On Sat, Sep 05, 2020 at 07:50:01PM +0200, Jakub Narębski wrote:
> Hello,
>
> On Sat, 5 Sep 2020 at 19:38, Taylor Blau <me@xxxxxxxxxxxx> wrote:
> > Right, once we get handed back a filter from
> > 'get_or_compute_bloom_filter()', we can't distinguish between (a) a
> > commit with too many changes to store in a single Bloom filter, and (b)
> > a commit with no changes at all.
>
> We could change how we store either no-changes Bloom filter (as all
> zeros minimal size filter), or too-many-changes Bloom filter (as all
> ones, i.e. max unsigned value, minimal size filter). This change would
> not require to change any user of Bloom filter.

I don't think that's true. Say that we changed the empty Bloom filter to
be encoded as only having the most-significant bit set. First, we'd have
to write a Bloom filter where we didn't have to before. But the real
issue is that commit-graph files generated with new clients would
suddenly be unreadable by old clients.
>
> > It's unfortunate that callers can't pick between the two, but this
> > implementation is actually an improvement on the status-quo! Why?
> > Because right now we'll see an "empty" Bloom filter and recompute it
> > because it's "missing", only to discover that it has no changes.
> >
> > With this patch, we'll say "this filter looks too large", and stop
> > computing it, because we have already gone through the effort to compute
> > it once (and marked it in the BFXL chunk).
>
> Can we use this when computing trace2 values?

We could, but I don't think it's absolutely necessary. The test coverage
in t4216 gives us enough confidence already.

> [...]
> > > This is a minor issue, though.
> >
> > Thanks for raising it. I don't think that this is a show-stopper for
> > this series.
>
> I agree.

Thanks for your input!

> Best,
> --
> Jakub Narębski

Thanks,
Taylor