On Fri, Aug 11, 2023 at 03:13:37PM -0700, Jonathan Tan wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > On both linux.git and git.git, this series gives a significant speed-up > > when upgrading Bloom filters from v1 to v2. On linux.git: > > > > Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit > > Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s] > > Range (min … max): 124.621 s … 125.227 s 3 runs > > > > Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths > > Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s] > > Range (min … max): 79.112 s … 79.437 s 3 runs > > > > Summary > > 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran > > 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths' > > > > On git.git (where we do have some non-ASCII paths), the change goes from > > 4.163 seconds to 3.348 seconds, for a 1.24x speed-up. > > My main concern is that this modifies the code somewhat pervasively > (tracking the version of Bloom filters and removing assumptions about > what Bloom filter versions are in memory) in return for what I think > is a small speedup, when considering that we will perform this > operation only once for a given repo. I'll defer to server operators > on this (or other people handling large numbers of repos), though. Yeah, I think that this is certainly on the riskier side of things. But I have confidence in our test coverage here, and feel better with your thorough review. FWIW, I think that the speed-up is worthwhile (though I'm obviously biased here). When we were rolling out changed-path Bloom filters to repositories on GitHub.com, it was apparent that computing them from scratch in a single shot was infeasible. This led to 809e0327f5 (builtin/commit-graph.c: introduce '--max-new-filters=<n>', 2020-09-18). That would of course still save us here if we didn't have an upgrade path, since each recomputed filter would count against the maximum number we can generate in a single run. We could introduce a configuration knob, but I'd rather avoid it if possible. That can also be done on top in a later patch, which should be easy enough to do if we hear of a compelling use-case for wanting to disable the upgrade path here. > Putting that concern aside, I've reviewed the code and assuming that > we're OK with modifying the code in this way, this patch set looks good > to me, and hopefully my review will be of some help. Thanks very much -- it was more than helpful :-). I'll collect up this and your previous patches (as we have discussed off-list) and tie everything together in a single series for the maintainer. Thanks, Taylor