On Tue, Dec 29, 2020 at 08:35:56PM -0500, Derrick Stolee wrote: > On 12/28/2020 6:15 AM, Abhishek Kumar via GitGitGadget wrote: > > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > > > Before computing Bloom fitlers, the commit-graph machinery uses > > s/fitlers/filters/ > > > commit_gen_cmp to sort commits by generation order for improved diff > > performance. 3d11275505 (commit-graph: examine commits by generation > > number, 2020-03-30) claims that this sort can reduce the time spent to > > compute Bloom filters by nearly half. > > > > But since c49c82aa4c (commit: move members graph_pos, generation to a > > slab, 2020-06-17), this optimization is broken, since asking for a > > 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY > > while writing. > > > > Not all hope is lost, though: 'commit_graph_generation()' falls back to > > comparing commits by their date when they have equal generation number, > > and so since c49c82aa4c is purely a date comparision function. This > > s/comparision/comparison/ > > > heuristic is good enough that we don't seem to loose appreciable > > performance while computing Bloom filters. Applying this patch (compared > > with v2.29.1) speeds up computing Bloom filters by around ~4 > > seconds. > > Using "~4 seconds" here is odd since there is no baseline. Which > repository did you use? > I used the linux repository, will mention that. > Previous discussion used relative terms. Something like "speeds up by > a factor of 1.25" or something might be interesting. > As SZEDER Gábor found, the improvements are rather minor - ranging from 0.40% to 5.19% [1]. I want to make sure this is the correct way to word in the commit message: Applying this patch (compared with v2.30.0) speeds up computing Bloom filters by factors ranging from 0.40% to 5.19% on various repositories. https://lore.kernel.org/git/20210105094535.GN8396@xxxxxxxxxx/ > > So, avoid the useless 'commit_graph_generation()' while writing by > > instead accessing the slab directly. This returns the newly-computed > > generation numbers, and allows us to avoid the heuristic by directly > > comparing generation numbers. > > This introduces some timing restrictions to the ability for this > comparison function. It would be dangerous if someone extracted > the method for another purpose. A comment above these lines could > warn future developers from making that mistake, but they would > probably use the comparison functions in commit.c instead. > Sure, will add a comment above. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > --- > > commit-graph.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/commit-graph.c b/commit-graph.c > > index 06f8dc1d896..caf823295f4 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -144,8 +144,8 @@ static int commit_gen_cmp(const void *va, const void *vb) > > const struct commit *a = *(const struct commit **)va; > > const struct commit *b = *(const struct commit **)vb; > > > > - uint32_t generation_a = commit_graph_generation(a); > > - uint32_t generation_b = commit_graph_generation(b); > > + uint32_t generation_a = commit_graph_data_at(a)->generation; > > + uint32_t generation_b = commit_graph_data_at(b)->generation; > > /* lower generation commits first */ > > if (generation_a < generation_b) > > return -1; > > >