Re: [PATCH v2] Teach git-rev-list --simplify-forks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/1/2020 10:13 AM, Antonio Russo wrote:
> On 4/29/20 7:12 AM, Derrick Stolee wrote:
>> On 4/29/2020 4:01 AM, Antonio Russo wrote:
>>> When used with --graph, instead of displaying the full graph, display a
>>> spanning subgraph produced by a depth-first search of the graph visiting
>>> parents from left to right.  Edges to already visited commits are
>>> discarded.  This process is repeated for every commit to be displayed.
>>>
>>> This is valuable to reduce visual clutter when there are many merges
>>> that were not rebased onto each other and the user is primarily
>>> interested in the state of the branch being merged into.
>>
>> My earlier comment to recommend how to fix the test failures intended
>> to demonstrate that this area of code requires a bit of precision. I
>> took some time to review this patch, but I find it needs some significant
>> updates.
>>
>> tl;dr: The way you manipulate the commit parents seems incorrect to me,
>> but perhaps there is enough prior art in the way we simplify parents to
>> make that acceptable. Someone else will need to chime in on that part.
> 
> First, thank you for taking the time look at this.  I understand your
> hesitation about the "amputation" of the history, but in some sense
> that's the point of this option.  I really want to be ignorant of the
> details of when the fork branched off.  I would like the reported
> history to be appear nearly equivalent to a rebase-and-fastforward only
> merge style, which results in a much simpler git log --graph.
> 
>> It may be possible to do this "drop edges" entirely inside of graph.c
>> (the graph rendering code) instead of revision.c, which would automatically
>> work with the performance gains from the newer topo-walk algorithm.
> 
> Non-local information about the commit graph needs to be used to do this
> amputation of the history.  We cannot know how many parents we want to
> display until we've completely explored all the parents of that node.
> Unfortunately, that means that the whole graph needs to be loaded, and I
> cannot really see how there would be any gain by doing this in graph.c.
> 
> Caveat: there are semi-streaming DFS implementations (i.e., O(n log n)
> space) that we might be able to use to get the first line out the door
> quicker. I would, however, like to leave that to another patch.
> 
> I will also add that, for the tests I've done, all performance penalties
> have been insignificant (less than ~5% for showing the first commit),
> while there are significant performance _improvements_, e.g., ~40% for
> displaying the full tree.
> 
> A notable exception is --all, which can be ~50x faster for the full
> output, but is often dramatically slower to show anything (i.e., the
> first line).
> 
>> There are enough deviations from code and test style that this will
>> need a significant revision regardless.
> 
> (Please see forthcoming revision 3).
> 
>>> This second revision of the patch sets revs->limited.  This forces the
>>> graph of commits to be loaded, and simplfiy_forks() therefore reliably
>>> traverses it.  This addresses the test failures mentioned before (see [1]).
>>
>> This will have a significant performance impact on the option, as you will
>> not see even the first result until the DFS has completed.
> 
> First of all, short of using some other, more sophisticated streaming
> version of this algorithm, the full DFS must finish before the first
> commit having two (or more) parents can be shown.
> 
> That said, the performance is not significantly affected:
> 
> I ran the following test (2.26.2, with my patch on top of it):
> (git lg = git log --graph --pretty=oneline)
> 
>  % time git lg -n1 --ignore-merge-bases e896a286df > /dev/null
>  0.73s user 0.02s system 99% cpu 0.746 total
> 
>  % time git lg -n1 e896a286df > /dev/null
>  0.72s user 0.01s system 99% cpu 0.731 total
> 
>  For the linux git repo:
> 
>  % time git lg -n1 --ignore-merge-bases v5.7-rc3 >/dev/null
>  9.25s user 0.39s system 99% cpu 9.646 total
> 
>  % time git lg -n1 v5.7-rc3 >/dev/null
>  9.02s user 0.35s system 99% cpu 9.378 total
> 
> So the performance seems basically unaffected for very limited graphs.
> It's also about 40% faster for complicated ones (as mentioned in my
> first email):
> 
>  % time git lg --ignore-merge-bases e870325ee8 > /dev/null
>  0.83s user 0.06s system 99% cpu 0.886 total
> 
>  % time git lg e870325ee8 > /dev/null
>  1.41s user 0.03s system 99% cpu 1.443 total
> 
>  For the linux git repo:
> 
>  % time git lg --ignore-merge-bases v5.7-rc3 >/dev/null
>  11.86s user 0.62s system 99% cpu 12.489 total
> 
>  % time git lg v5.7-rc3 >/dev/null
>  21.56s user 0.55s system 99% cpu 22.108 total

First, run `git commit-graph write --reachable` to create a commit-graph
file which has generation numbers. In this case, I get the following:

$ time git log --oneline --graph v5.7-rc3 >/dev/null

real    0m13.548s
user    0m13.348s
sys     0m0.200s

$ time git log --oneline --graph -n 1 v5.7-rc3 >/dev/null

real    0m0.007s
user    0m0.004s
sys     0m0.004s

Notice exactly how much better this gets for the first result with that
file.

> This is because the amputated graph is much simpler, and the rest of the
> code needs to do much less work.
> 
> Passing --all is another beast, and does indeed suffer:
> 
>  % time git lg --ignore-merge-bases --all >/dev/null
>  4.06s user 0.04s system 99% cpu 4.105 total
> 
>  % time git lg --all >/dev/null
>  189.59s user 0.04s system 99% cpu 3:09.65 total
> 
>  (and for the first line)
> 
>  % time git lg -n1 --ignore-merge-bases --all >/dev/null
>  3.86s user 0.02s system 99% cpu 3.874 total
> 
>  % time git lg -n1 --all >/dev/null
>  0.83s user 0.02s system 99% cpu 0.848 total
> 
> (If you need to use --all for the Linux git repo, you should not use
> --ignore-merge-bases).

I think this is a deficiency in your implementation, not a hard rule
about how these options would need to interact.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux