Re: [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Fri, 11 Aug 2023 15:13:37 -0700

Taylor Blau <me@xxxxxxxxxxxx> writes:
> On both linux.git and git.git, this series gives a significant speed-up
> when upgrading Bloom filters from v1 to v2. On linux.git:
> 
>     Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths
>       Time (mean ± σ):     124.873 s ±  0.316 s    [User: 124.081 s, System: 0.643 s]
>       Range (min … max):   124.621 s … 125.227 s    3 runs
> 
>     Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths
>       Time (mean ± σ):     79.271 s ±  0.163 s    [User: 74.611 s, System: 4.521 s]
>       Range (min … max):   79.112 s … 79.437 s    3 runs
> 
>     Summary
>       'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran
>         1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths'
> 
> On git.git (where we do have some non-ASCII paths), the change goes from
> 4.163 seconds to 3.348 seconds, for a 1.24x speed-up.

My main concern is that this modifies the code somewhat pervasively
(tracking the version of Bloom filters and removing assumptions about
what Bloom filter versions are in memory) in return for what I think
is a small speedup, when considering that we will perform this
operation only once for a given repo. I'll defer to server operators
on this (or other people handling large numbers of repos), though.

Putting that concern aside, I've reviewed the code and assuming that
we're OK with modifying the code in this way, this patch set looks good
to me, and hopefully my review will be of some help.