Re: [PATCH v3 12/14] commit-graph: add large-filters bitmap chunk

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 2 Sep 2020 16:23:23 -0400

(Finally getting back to this review after working on some other topics
for a while..., sorry for the late response).

On Wed, Aug 19, 2020 at 03:35:26PM +0200, SZEDER Gábor wrote:
> On Tue, Aug 11, 2020 at 04:52:07PM -0400, Taylor Blau wrote:
> > When a commit has more than a certain number of changed paths (commonly
> > 512), the commit-graph machinery represents it as a zero-length filter.
> > This is done since having many entries in the Bloom filter has
> > undesirable effects on the false positivity rate.
>
> This is not the case, the false positive probability depends on the
> ratio of the Bloom filter's size and the number of elements it
> contains, and we size the filters proportional to the number of
> elements they contain, so the number of elements shouldn't affect the
> false positive rate.

I'm not sure that I understand. I agree that the FPR depends on the
ratio between the number of elements in the filter and the filter's
"size". But, consider a Bloom filter that is too small to faithfully
represent all its elements. Such a filter would likely have all its bits
set high, in which case every query would return "maybe", and the FPR
would go up.

> On the contrary, it's the small filters, up to around 30-35 bytes,
> that tend to have larger than expected false positive rate when using
> double hashing.

I agree that small filters suffer from the same, but I think this is an
"in addition" not an "on the contrary".

In either case, I don't think that this is an important detail for the
commit message. What matters is the representation (that we truncate >=
512 elements to a length-zero filter), not why (that can be found in
another commit). I'd have expected to find the rationale in ed591febb4
(bloom.c: core Bloom filter implementation for changed paths.,
2020-03-30), but I couldn't find anything there.

So, I'll drop this sentence entirely to avoid an unimportant detail.

Thanks,
Taylor