Re: [PATCH v4 01/10] commit-graph: fix regression when computing Bloom filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Abhishek Kumar via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
>
> commit_gen_cmp is used when writing a commit-graph to sort commits in
> generation order before computing Bloom filters. Since c49c82aa (commit:
> move members graph_pos, generation to a slab, 2020-06-17) made it so
> that 'commit_graph_generation()' returns 'GENERATION_NUMBER_INFINITY'
> during writing, we cannot call it within this function. Instead, access
> the generation number directly through the slab (i.e., by calling
> 'commit_graph_data_at(c)->generation') in order to access it while
> writing.

This description is all right, but I think it can be made more clear:

  When running `git commit-graph write --reachable --changed-paths` to
  compute Bloom filters for changed paths, commits are first sorted by
  generation number using 'commit_gen_cmp()'.  Commits with similar
  generation are more likely to have many trees in common, making the
  diff faster, see 3d112755.

  However, since c49c82aa (commit: move members graph_pos, generation to
  a slab, 2020-06-17) made it so that 'commit_graph_generation()'
  returns 'GENERATION_NUMBER_INFINITY' during writing, we cannot call it
  within this function.  Instead, access the generation number directly
  through the slab (i.e., by calling 'commit_graph_data_at(c)->generation')
  in order to access it while writing.

Or something like that.

We should also add an explanation why avoiding getter is safe here,
perhaps adding the following line to the second paragraph:

  It is safe to do because 'commit_gen_cmp()' from commit-graph.c is
  static and used only when writing Bloom filters, and because writing
  changed-paths filters is done after computing generation numbers (if
  necessary).

Or something like that.

>
> While measuring performance with `git commit-graph write --reachable
> --changed-paths` on the linux repository led to around 1m40s for both
> HEAD and master (and could be due to fault in my measurements), it is
> still the "right" thing to do.

I had to read the above paragraph several times to understand it,
possibly because I have expected here to be a fix for a performance
regression.  The commit message for 3d112755 (commit-graph: examine
commits by generation number) describes reduction of computation time
from 3m00s to 1m37s.  So I would expect performance with HEAD (i.e.
before those changes) to be around 3m, not the same before and after
changes being around 1m40s.

Can anyone recheck this before-and-after benchmark, please?

Anyway, it might be more clear to write it as the following:

  On the Linux kernel repository, this patch didn't reduce the
  computation time for 'git commit-graph write --reachable
  --changed-paths', which is around 1m40s both before and after this
  change.  This could be a fault in my measurements; it is still the
  "right" thing to do.

Or something like that.

> Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx>
> ---

Anyway, it is nice and clear change.

>  commit-graph.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index cb042bdba8..94503e584b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -144,8 +144,8 @@ static int commit_gen_cmp(const void *va, const void *vb)
>  	const struct commit *a = *(const struct commit **)va;
>  	const struct commit *b = *(const struct commit **)vb;
>  
> -	uint32_t generation_a = commit_graph_generation(a);
> -	uint32_t generation_b = commit_graph_generation(b);
> +	uint32_t generation_a = commit_graph_data_at(a)->generation;
> +	uint32_t generation_b = commit_graph_data_at(b)->generation;
>  	/* lower generation commits first */
>  	if (generation_a < generation_b)
>  		return -1;

Best,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux