Re: [PATCH v5 01/11] commit-graph: fix regression when computing Bloom filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 29, 2020 at 08:35:56PM -0500, Derrick Stolee wrote:
> On 12/28/2020 6:15 AM, Abhishek Kumar via GitGitGadget wrote:
> > From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> > 
> > Before computing Bloom fitlers, the commit-graph machinery uses
> 
> s/fitlers/filters/
> 
> > commit_gen_cmp to sort commits by generation order for improved diff
> > performance. 3d11275505 (commit-graph: examine commits by generation
> > number, 2020-03-30) claims that this sort can reduce the time spent to
> > compute Bloom filters by nearly half.
> > 
> > But since c49c82aa4c (commit: move members graph_pos, generation to a
> > slab, 2020-06-17), this optimization is broken, since asking for a
> > 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY
> > while writing.
> > 
> > Not all hope is lost, though: 'commit_graph_generation()' falls back to
> > comparing commits by their date when they have equal generation number,
> > and so since c49c82aa4c is purely a date comparision function. This
> 
> s/comparision/comparison/
> 
> > heuristic is good enough that we don't seem to loose appreciable
> > performance while computing Bloom filters. Applying this patch (compared
> > with v2.29.1) speeds up computing Bloom filters by around ~4
> > seconds.
> 
> Using "~4 seconds" here is odd since there is no baseline. Which
> repository did you use?
> 

I used the linux repository, will mention that.

> Previous discussion used relative terms. Something like "speeds up by
> a factor of 1.25" or something might be interesting.
> 

As SZEDER Gábor found, the improvements are rather minor - ranging from
0.40% to 5.19% [1]. I want to make sure this is the correct way to word
in the commit message:

Applying this patch (compared with v2.30.0) speeds up computing Bloom
filters by factors ranging from 0.40% to 5.19% on various
repositories. 

https://lore.kernel.org/git/20210105094535.GN8396@xxxxxxxxxx/

> > So, avoid the useless 'commit_graph_generation()' while writing by
> > instead accessing the slab directly. This returns the newly-computed
> > generation numbers, and allows us to avoid the heuristic by directly
> > comparing generation numbers.
> 
> This introduces some timing restrictions to the ability for this
> comparison function. It would be dangerous if someone extracted
> the method for another purpose. A comment above these lines could
> warn future developers from making that mistake, but they would
> probably use the comparison functions in commit.c instead.
> 

Sure, will add a comment above.

> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> > ---
> >  commit-graph.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 06f8dc1d896..caf823295f4 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -144,8 +144,8 @@ static int commit_gen_cmp(const void *va, const void *vb)
> >  	const struct commit *a = *(const struct commit **)va;
> >  	const struct commit *b = *(const struct commit **)vb;
> >  
> > -	uint32_t generation_a = commit_graph_generation(a);
> > -	uint32_t generation_b = commit_graph_generation(b);
> > +	uint32_t generation_a = commit_graph_data_at(a)->generation;
> > +	uint32_t generation_b = commit_graph_data_at(b)->generation;
> >  	/* lower generation commits first */
> >  	if (generation_a < generation_b)
> >  		return -1;
> > 
> 



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux