On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote: > There are established corner cases, where in a repo where commit dates > are not monotonically increasing, revision walking can go horribly > wrong. This was discussed in the past in e.g. > https://public-inbox.org/git/20150521061553.GA29269@xxxxxxxxxxxx/ > > The only (simple) workable way, given the current algorithm, to get an > accurate view off rev-list is to essentially make slop infinite. This > works fine, at the expense of runtime. > > Now, ignoring any modification for the above, I'm hitting another corner > case in some other "weird" history, where I have 500k commits all with > the same date. With such a commit dag, something as trivial as > `git rev-list HEAD~..HEAD` goes through all commits from the root commit > to HEAD, which takes multiple seconds, when the (obvious) output is one > commit. > > It looks like the only way revision walking stops going through all the > ancestry is through slop, and slop is essentially made infinite by the > fact all commits have the same date (because of the date check in > still_interesting(). By extension, this means the workaound for the > first corner case above, which is to make slop infinite, essentially > makes all rev walking go through the entire ancestry of the commits > given on the command line. > > It feels like some cases of everybody_uninteresting should shorcut slop > entirely, but considering the only way for slop to decrease at all is > when everybody_uninteresting returns true, that would seem like a wrong > assumption. But I'm also not sure what slop helps with in the first > place (but I don't have a clear view of the broader picture of how the > entire revision walking works). > > Anyways, a rather easy way to witness this happening is to create a > dummy repo like: > git init foo > cd foo > for i in $(seq 1 50); do > echo $i > a; > git add a; > git commit -a -m $i; > done > > The something as simple as `git rev-list HEAD~..HEAD` will go through > all 50 commits (assuming the script above created commits in the same > second, which it did on my machine) > > By the time both HEAD~ and HEAD have been processed, the revision > walking should have enough information to determine that it doesn't need > to go further, but still does. Even with something like HEAD~2..HEAD, > after the first round of processing parents it should be able to see > there's not going to be any more interesting commits. > > I'm willing to dig into this, but if someone familiar with the > algorithm could give me some hints as to what I might be missing in the > big picture, that would be helpful. All the above is without commit-graph, I presume? If so, then you should give it a try, as it might bring immediate help in your pathological repo. With 5k commit in the same second (enforced via 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get: $ best-of-five -q git rev-list HEAD~..HEAD 0.069 $ git commit-graph write --reachableComputing commit graph generation numbers: 100% (5000/5000), done. $ best-of-five -q git rev-list HEAD~..HEAD 0.004