Re: [PATCH 2/2] line-log: avoid unnecessary full tree diffs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/22/2019 4:41 AM, SZEDER Gábor wrote:
> On Wed, Aug 21, 2019 at 07:35:15PM +0200, SZEDER Gábor wrote:
>> So line-level log clearly computes a lot less diffs than
>> '--full-history', though still about 50% more than a regular
>> pathspec-limited history traversal.  Looking at the commit-parent
>> pairs in the output, it appears that the difference comes mostly from
>> merge commits, because line-level log compares a merge commit with all
>> of its parents.
> 
>> It seems there is still more room for improvements by avoiding
>> commit-non_first_parent diffs when the first parent is TREESAME, and
>> doing so could hopefully avoid triggering rename detection in those
>> subtree merges or in case of your evil path.
> 
> Well, that fruit hung much lower than I though, just look at the size
> of the WIP patch below.  I just hope that there are no unexpected
> surprises, but FWIW it produces the exact same output for all files up
> to 't/t5515' in v2.23.0 as the previous patch.
> 
> Can't wait to see how it fares with that evil Windows path :)

Thanks for this! With this patch, we finally have the time down to ~20s.

This is a HUGE improvement, especially considering there is only one result
for the particular section, so the entire history is explored in that time.
 
>   --- >8 ---
> 
> Subject: [PATCH 3/2] WIP line-log: stop diff-ing after first TREESAME merge parent
> 
>   # git.git, ~25% of all commits are merges
>   $ time git --no-pager log -L:read_alternate_refs:sha1-file.c v2.23.0
> 
>   Before:
> 
>     real    0m2.516s
>     user    0m2.456s
>     sys     0m0.060s
> 
>   After:
> 
>     real    0m1.132s
>     user    0m1.096s
>     sys     0m0.036s
> 
>   # linux.git, ~7% of all commits are merges
>   $ time ~/src/git/git --no-pager log \
>     -L:build_restore_work_registers:arch/mips/mm/tlbex.c v5.2
> 
>   Before:
> 
>     real    0m2.599s
>     user    0m2.466s
>     sys     0m0.157s
> 
>   After:
> 
>     real    0m1.976s
>     user    0m1.856s
>     sys     0m0.121s
> 
> [TODO: get rid of unnecessary arrays, tests?, write commit message...]
> ---
>  line-log.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/line-log.c b/line-log.c
> index 9010e00950..a4b032f83a 100644
> --- a/line-log.c
> +++ b/line-log.c
> @@ -1184,13 +1184,11 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
>  
>  	p = commit->parents;
>  	for (i = 0; i < nparents; i++) {
> +		int changed;
>  		parents[i] = p->item;
>  		p = p->next;
>  		queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
> -	}
>  
> -	for (i = 0; i < nparents; i++) {
> -		int changed;
>  		cand[i] = NULL;
>  		changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
>  		if (!changed) {

Interesting. The old logic computed ALL the diffs, then started navigating.

By navigating before computing all the diffs, we are now avoiding the rename logic
on the SECOND parent, and there will be a lot of second parents that do not include
the file (depending on the number of parallel topics being merged independently).
That's why git.git has a better performance difference than linux.git.

> @@ -1203,7 +1201,7 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
>  			commit_list_append(parents[i], &commit->parents);
>  			free(parents);
>  			free(cand);
> -			free_diffqueues(nparents, diffqueues);
> +			free_diffqueues(i, diffqueues);

Good point here, as we haven't initialized all of the queues.

Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux