Re: [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 17 Sep 2020 21:15:17 -0400

On Thu, Sep 17, 2020 at 05:52:11PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> >> > In practice, these boundary commits likely occupy a small proportion of
> >> > the overall number of commits, and so the size penalty is likely smaller
> >> > than storing a bitmap for all commits.
> >>
> >>                  |      Percentage of
> >>                  |    commits modifying
> >>                  |   0 path   |  >= 512 paths
> >>   ---------------+------------+----------------
> >>   android-base   |   13.20%   |   0.13%
> >>   cmssw          |    0.15%   |   0.23%
> >>   cpython        |    3.07%   |   0.01%
> >>   elasticsearch  |    0.70%   |   1.00%
> >>   gcc            |    0.00%   |   0.08%
> >>   gecko-dev      |    0.14%   |   0.64%
> >>   git            |    0.11%   |   0.02%
> >>   glibc          |    0.02%   |   0.10%
> >>   go             |    0.00%   |   0.07%
> >>   homebrew-cask  |    0.40%   |   0.02%
> >>   homebrew-core  |    0.01%   |   0.01%
> >>   jdk            |    0.26%   |   5.64%
> >>   linux          |    0.01%   |   0.51%
> >>   llvm-project   |    0.12%   |   0.03%
> >>   rails          |    0.10%   |   0.10%
> >>   rust           |    0.07%   |   0.17%
> >>   tensorflow     |    0.09%   |   1.02%
> >>   webkit         |    0.05%   |   0.31%
> >
> > This is very useful information to have! Without the total number of
> > commits, it's impossible to know whether or not this is a win over the
> > BFXL chunk. But, since the number of commits is probably "large" versus
> > the percentage of boundary commits which is "small", it's almost
> > certainly an advantage.
>
> Do you want to include it in either the log message in one of the
> commits, in code comment, or a technical doc?

Let's put it in the commit message. I think that it's useful enough that
people interested enough to dig through the commits would want it, but
too detailed to be as visible as in the technical documentation.

> > So, I think that if it's truly misleading, we could revisit this after
> > the topic is merged. But, I'm not planning on changing anything at this
> > point.
>
> If you do not want to help us go the last-mile to completion, that
> is sad, but I do not want to see a basically good patch stall like
> that, so let's find somebody else who can do the helping ;-)

I'm sorry to give off the impression that I do not want to help; that's
not the case. After reading Gàbor's email, I thought that the changes he
was suggesting were minor, and was trying to say "let's do this on top
and apply the important bug fixes first," not, "I am never going to
touch this again".

> Here is what my trial rebase produced.  I'll queue it to 'seen' (if
> you prefer I can send a full v3 patch, but I expect that you know
> how to fetch from 'seen' and review locally) after checking if the
> result passes the tests locally; an extra set of eyeballs to verify
> the result is pretty much appreciated.

It looks off to a great start; and thanks for taking the care to make up
for my laziness. Everything you did looks good to me. I touched up
"bloom: encode out-of-bounds filters as non-empty" locally to include
the table above (and some commentary around it), so I'll send a v3 to
the list once I have finished running the tests.

> Thanks.

Thank you, and sorry again.

Thanks,
Taylor