Re: [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 18, 2020 at 12:13:02AM +0200, SZEDER Gábor wrote:
> > Clients will invoke the Bloom machinery in more cases than before, but
> > this can be addressed by returning a NULL filter when all bits are set
> > high. This can be addressed in a future patch.
>
> OTOH, clients will invoke the tree-diff machinery in fewer cases than
> before, because querying the Bloom filter of commits not modifying any
> files will now return "definitely not".

Absolutely right.

> > Finally, note that this does increase the size of on-disk commit-graphs,
> > but far less than other proposals. In particular, this is generally more
> > efficient than storing a bitmap for which commits haven't computed their
> > Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
> > whereas storing explicit filters as above incurs a penalty of one byte
> > per too-large or too-small commit.
>
> s/too-small/empty/

Fair enough, although I'm not planning to alter this or any other patch
now that it's picked up unless there's a real show-stopper (this doesn't
seem like one).

> > In practice, these boundary commits likely occupy a small proportion of
> > the overall number of commits, and so the size penalty is likely smaller
> > than storing a bitmap for all commits.
>
>                  |      Percentage of
>                  |    commits modifying
>                  |   0 path   |  >= 512 paths
>   ---------------+------------+----------------
>   android-base   |   13.20%   |   0.13%
>   cmssw          |    0.15%   |   0.23%
>   cpython        |    3.07%   |   0.01%
>   elasticsearch  |    0.70%   |   1.00%
>   gcc            |    0.00%   |   0.08%
>   gecko-dev      |    0.14%   |   0.64%
>   git            |    0.11%   |   0.02%
>   glibc          |    0.02%   |   0.10%
>   go             |    0.00%   |   0.07%
>   homebrew-cask  |    0.40%   |   0.02%
>   homebrew-core  |    0.01%   |   0.01%
>   jdk            |    0.26%   |   5.64%
>   linux          |    0.01%   |   0.51%
>   llvm-project   |    0.12%   |   0.03%
>   rails          |    0.10%   |   0.10%
>   rust           |    0.07%   |   0.17%
>   tensorflow     |    0.09%   |   1.02%
>   webkit         |    0.05%   |   0.31%

This is very useful information to have! Without the total number of
commits, it's impossible to know whether or not this is a win over the
BFXL chunk. But, since the number of commits is probably "large" versus
the percentage of boundary commits which is "small", it's almost
certainly an advantage.

> > A test to exercise filters which contain too many changed path entries
> > will be introduced in a subsequent patch.
>
>
> > diff --git a/bloom.h b/bloom.h
> > index c6d77e8393..70a8840896 100644
> > --- a/bloom.h
> > +++ b/bloom.h
> > @@ -93,6 +93,7 @@ enum bloom_filter_computed {
> >  	BLOOM_NOT_COMPUTED = (1 << 0),
> >  	BLOOM_COMPUTED     = (1 << 1),
> >  	BLOOM_TRUNC_LARGE  = (1 << 2),
> > +	BLOOM_TRUNC_SMALL  = (1 << 3),
>
> s/SMALL/EMPTY/
>
> This "small" suffix in the constant, variable, and trace2 key names is
> misleading, because we only mean empty commits.

I could buy that it might be misleading; I only picked this since it was
the opposite of "large". You could imagine that BLOOM_TRUNC_X means
"truncated in the direction of 'x'", but to be honest I don't think that
this matters.

I understand the churn of coming back to this after the topic has
already been merged creates more hassle, but frankly this series has
already gone on for quite a while, and it has been holding up important
bug fixes that are unrelated to the main feature.

So, I think that if it's truly misleading, we could revisit this after
the topic is merged. But, I'm not planning on changing anything at this
point.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux