On 8/22/2019 4:41 AM, SZEDER Gábor wrote: > On Wed, Aug 21, 2019 at 07:35:15PM +0200, SZEDER Gábor wrote: >> So line-level log clearly computes a lot less diffs than >> '--full-history', though still about 50% more than a regular >> pathspec-limited history traversal. Looking at the commit-parent >> pairs in the output, it appears that the difference comes mostly from >> merge commits, because line-level log compares a merge commit with all >> of its parents. > >> It seems there is still more room for improvements by avoiding >> commit-non_first_parent diffs when the first parent is TREESAME, and >> doing so could hopefully avoid triggering rename detection in those >> subtree merges or in case of your evil path. > > Well, that fruit hung much lower than I though, just look at the size > of the WIP patch below. I just hope that there are no unexpected > surprises, but FWIW it produces the exact same output for all files up > to 't/t5515' in v2.23.0 as the previous patch. > > Can't wait to see how it fares with that evil Windows path :) Thanks for this! With this patch, we finally have the time down to ~20s. This is a HUGE improvement, especially considering there is only one result for the particular section, so the entire history is explored in that time. > --- >8 --- > > Subject: [PATCH 3/2] WIP line-log: stop diff-ing after first TREESAME merge parent > > # git.git, ~25% of all commits are merges > $ time git --no-pager log -L:read_alternate_refs:sha1-file.c v2.23.0 > > Before: > > real 0m2.516s > user 0m2.456s > sys 0m0.060s > > After: > > real 0m1.132s > user 0m1.096s > sys 0m0.036s > > # linux.git, ~7% of all commits are merges > $ time ~/src/git/git --no-pager log \ > -L:build_restore_work_registers:arch/mips/mm/tlbex.c v5.2 > > Before: > > real 0m2.599s > user 0m2.466s > sys 0m0.157s > > After: > > real 0m1.976s > user 0m1.856s > sys 0m0.121s > > [TODO: get rid of unnecessary arrays, tests?, write commit message...] > --- > line-log.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/line-log.c b/line-log.c > index 9010e00950..a4b032f83a 100644 > --- a/line-log.c > +++ b/line-log.c > @@ -1184,13 +1184,11 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm > > p = commit->parents; > for (i = 0; i < nparents; i++) { > + int changed; > parents[i] = p->item; > p = p->next; > queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]); > - } > > - for (i = 0; i < nparents; i++) { > - int changed; > cand[i] = NULL; > changed = process_all_files(&cand[i], rev, &diffqueues[i], range); > if (!changed) { Interesting. The old logic computed ALL the diffs, then started navigating. By navigating before computing all the diffs, we are now avoiding the rename logic on the SECOND parent, and there will be a lot of second parents that do not include the file (depending on the number of parallel topics being merged independently). That's why git.git has a better performance difference than linux.git. > @@ -1203,7 +1201,7 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm > commit_list_append(parents[i], &commit->parents); > free(parents); > free(cand); > - free_diffqueues(nparents, diffqueues); > + free_diffqueues(i, diffqueues); Good point here, as we haven't initialized all of the queues. Thanks, -Stolee