On Fri, Sep 21, 2018 at 10:39:36AM -0700, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > When running a command like 'git rev-list --topo-order HEAD', > Git performed the following steps: > [...] > In the new algorithm, these three steps correspond to three > different commit walks. We run these walks simultaneously, A minor nit, but this commit message doesn't mention the most basic thing up front: that its main purpose is to introduce a new algorithm for topo-order. ;) It's obvious in the context of reviewing the series, but somebody reading "git log" later may want a bit more. Perhaps: revision.c: implement generation-based topo-order algorithm as a subject, and/or an introductory paragraph like: The current --topo-order algorithm requires walking all commits we are going to output up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as we go. Other than that, I find this to be a wonderfully explanatory commit message. :) > The walks are as follows: > > 1. EXPLORE: using the explore_queue priority queue (ordered by > maximizing the generation number), parse each reachable > commit until all commits in the queue have generation > number strictly lower than needed. During this walk, update > the UNINTERESTING flags as necessary. OK, this makes sense. If we know that everybody else in our queue is at generation X, then it is safe to output a commit at generation greater than X. I think this by itself would allow us to implement "show no parents before all of its children are shown", right? But --topo-order promises a bit more: "avoid showing commits no multiple lines of history intermixed". I guess also INFINITY generation numbers need more. For a real generation number, we know that "gen(A) == gen(B)" implies that there is no ancestry relationship between the two. But not so for INFINITY. > 2. INDEGREE: using the indegree_queue priority queue (ordered > by maximizing the generation number), add one to the in- > degree of each parent for each commit that is walked. Since > we walk in order of decreasing generation number, we know > that discovering an in-degree value of 0 means the value for > that commit was not initialized, so should be initialized to > two. (Recall that in-degree value "1" is what we use to say a > commit is ready for output.) As we iterate the parents of a > commit during this walk, ensure the EXPLORE walk has walked > beyond their generation numbers. I wondered how this would work for INFINITY. We can't know the order of a bunch of INFINITY nodes at all, so we never know when their in-degree values are "done". But if I understand the EXPLORE walk, we'd basically walk all of INFINITY down to something with a real generation number. Is that right? But after that, I'm not totally clear on why we need this INDEGREE walk. > 3. TOPO: using the topo_queue priority queue (ordered based on > the sort_order given, which could be commit-date, author- > date, or typical topo-order which treats the queue as a LIFO > stack), remove a commit from the queue and decrement the > in-degree of each parent. If a parent has an in-degree of > one, then we add it to the topo_queue. Before we decrement > the in-degree, however, ensure the INDEGREE walk has walked > beyond that generation number. OK, this makes sense to make --author-date-order, etc, work. Potentially those numbers might have no relationship at all with the graph structure, but we promise "no parent before its children are shown", so this is really just a tie-breaker after the topo-sort anyway. As long as steps 1 and 2 are correct and produce a complete set of commits for one "layer", this should be OK. I guess I'm not 100% convinced that we don't have a case where we haven't yet parsed or considered some commit that we know cannot have an ancestry relationship with commits we are outputting. But it may have an author-date-order relationship. (I'm not at all convinced that this _is_ a problem, and I suspect it isn't; I'm only suggesting I haven't fully grokked the proof). > --- > object.h | 4 +- > revision.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > revision.h | 2 + > 3 files changed, 194 insertions(+), 8 deletions(-) I'll pause here on evaluating the actual code. It looks sane from a cursory read, but there's no point in digging further until I'm sure I fully understand the algorithm. I think that needs a little more brain power from me, and hopefully discussion around my comments above will help trigger that. -Peff