On 5/1/2020 10:13 AM, Antonio Russo wrote: > On 4/29/20 7:12 AM, Derrick Stolee wrote: >> On 4/29/2020 4:01 AM, Antonio Russo wrote: >>> When used with --graph, instead of displaying the full graph, display a >>> spanning subgraph produced by a depth-first search of the graph visiting >>> parents from left to right. Edges to already visited commits are >>> discarded. This process is repeated for every commit to be displayed. >>> >>> This is valuable to reduce visual clutter when there are many merges >>> that were not rebased onto each other and the user is primarily >>> interested in the state of the branch being merged into. >> >> My earlier comment to recommend how to fix the test failures intended >> to demonstrate that this area of code requires a bit of precision. I >> took some time to review this patch, but I find it needs some significant >> updates. >> >> tl;dr: The way you manipulate the commit parents seems incorrect to me, >> but perhaps there is enough prior art in the way we simplify parents to >> make that acceptable. Someone else will need to chime in on that part. > > First, thank you for taking the time look at this. I understand your > hesitation about the "amputation" of the history, but in some sense > that's the point of this option. I really want to be ignorant of the > details of when the fork branched off. I would like the reported > history to be appear nearly equivalent to a rebase-and-fastforward only > merge style, which results in a much simpler git log --graph. > >> It may be possible to do this "drop edges" entirely inside of graph.c >> (the graph rendering code) instead of revision.c, which would automatically >> work with the performance gains from the newer topo-walk algorithm. > > Non-local information about the commit graph needs to be used to do this > amputation of the history. We cannot know how many parents we want to > display until we've completely explored all the parents of that node. > Unfortunately, that means that the whole graph needs to be loaded, and I > cannot really see how there would be any gain by doing this in graph.c. > > Caveat: there are semi-streaming DFS implementations (i.e., O(n log n) > space) that we might be able to use to get the first line out the door > quicker. I would, however, like to leave that to another patch. > > I will also add that, for the tests I've done, all performance penalties > have been insignificant (less than ~5% for showing the first commit), > while there are significant performance _improvements_, e.g., ~40% for > displaying the full tree. > > A notable exception is --all, which can be ~50x faster for the full > output, but is often dramatically slower to show anything (i.e., the > first line). > >> There are enough deviations from code and test style that this will >> need a significant revision regardless. > > (Please see forthcoming revision 3). > >>> This second revision of the patch sets revs->limited. This forces the >>> graph of commits to be loaded, and simplfiy_forks() therefore reliably >>> traverses it. This addresses the test failures mentioned before (see [1]). >> >> This will have a significant performance impact on the option, as you will >> not see even the first result until the DFS has completed. > > First of all, short of using some other, more sophisticated streaming > version of this algorithm, the full DFS must finish before the first > commit having two (or more) parents can be shown. > > That said, the performance is not significantly affected: > > I ran the following test (2.26.2, with my patch on top of it): > (git lg = git log --graph --pretty=oneline) > > % time git lg -n1 --ignore-merge-bases e896a286df > /dev/null > 0.73s user 0.02s system 99% cpu 0.746 total > > % time git lg -n1 e896a286df > /dev/null > 0.72s user 0.01s system 99% cpu 0.731 total > > For the linux git repo: > > % time git lg -n1 --ignore-merge-bases v5.7-rc3 >/dev/null > 9.25s user 0.39s system 99% cpu 9.646 total > > % time git lg -n1 v5.7-rc3 >/dev/null > 9.02s user 0.35s system 99% cpu 9.378 total > > So the performance seems basically unaffected for very limited graphs. > It's also about 40% faster for complicated ones (as mentioned in my > first email): > > % time git lg --ignore-merge-bases e870325ee8 > /dev/null > 0.83s user 0.06s system 99% cpu 0.886 total > > % time git lg e870325ee8 > /dev/null > 1.41s user 0.03s system 99% cpu 1.443 total > > For the linux git repo: > > % time git lg --ignore-merge-bases v5.7-rc3 >/dev/null > 11.86s user 0.62s system 99% cpu 12.489 total > > % time git lg v5.7-rc3 >/dev/null > 21.56s user 0.55s system 99% cpu 22.108 total First, run `git commit-graph write --reachable` to create a commit-graph file which has generation numbers. In this case, I get the following: $ time git log --oneline --graph v5.7-rc3 >/dev/null real 0m13.548s user 0m13.348s sys 0m0.200s $ time git log --oneline --graph -n 1 v5.7-rc3 >/dev/null real 0m0.007s user 0m0.004s sys 0m0.004s Notice exactly how much better this gets for the first result with that file. > This is because the amputated graph is much simpler, and the rest of the > code needs to do much less work. > > Passing --all is another beast, and does indeed suffer: > > % time git lg --ignore-merge-bases --all >/dev/null > 4.06s user 0.04s system 99% cpu 4.105 total > > % time git lg --all >/dev/null > 189.59s user 0.04s system 99% cpu 3:09.65 total > > (and for the first line) > > % time git lg -n1 --ignore-merge-bases --all >/dev/null > 3.86s user 0.02s system 99% cpu 3.874 total > > % time git lg -n1 --all >/dev/null > 0.83s user 0.02s system 99% cpu 0.848 total > > (If you need to use --all for the Linux git repo, you should not use > --ignore-merge-bases). I think this is a deficiency in your implementation, not a hard rule about how these options would need to interact. Thanks, -Stolee