On 10/25/2018 5:43 AM, Jeff King wrote:
On Thu, Oct 11, 2018 at 12:21:44PM -0400, Derrick Stolee wrote:
2. INDEGREE: using the indegree_queue priority queue (ordered
by maximizing the generation number), add one to the in-
degree of each parent for each commit that is walked. Since
we walk in order of decreasing generation number, we know
that discovering an in-degree value of 0 means the value for
that commit was not initialized, so should be initialized to
two. (Recall that in-degree value "1" is what we use to say a
commit is ready for output.) As we iterate the parents of a
commit during this walk, ensure the EXPLORE walk has walked
beyond their generation numbers.
I wondered how this would work for INFINITY. We can't know the order of
a bunch of INFINITY nodes at all, so we never know when their in-degree
values are "done". But if I understand the EXPLORE walk, we'd basically
walk all of INFINITY down to something with a real generation number. Is
that right?
But after that, I'm not totally clear on why we need this INDEGREE walk.
The INDEGREE walk is an important element for Kahn's algorithm. The final
output order is dictated by peeling commits of "indegree zero" to ensure all
children are output before their parents. (Note: since we use literal 0 to
mean "uninitialized", we peel commits when the indegree slab has value 1.)
This walk replaces the indegree logic from sort_in_topological_order(). That
method performs one walk that fills the indegree slab, then another walk
that peels the commits with indegree 0 and inserts them into a list.
I guess my big question here was: if we have generation numbers, do we
need Kahn's algorithm? That is, in a fully populated set of generation
numbers (i.e., no INFINITY), we could always just pick a commit with the
highest generation number to show.
So if we EXPLORE down to a real generation number in phase 1, why do we
need to care about INDEGREE anymore? Or am I wrong that we always have a
real generation number (i.e., not INFINITY) after EXPLORE? (And if so,
why is exploring to a real generation number a bad idea; presumably
it's due to a worst-case that goes deeper than we'd otherwise need to
here).
The issue is that we our final order (used by level 3) is unrelated to
generation number. Yes, if we prioritized by generation number then we
could output the commits in _some_ order that doesn't violate
topological constraints. However, we are asking for a different
priority, which is different than the generation number priority.
In the case of "--topo-order", we want to output the commits reachable
from the second parent of a merge before the commits reachable from the
first parent. However, in most cases the generation number of the first
parent is higher than the second parent (there are more things in the
merge chain than in a small topic that got merged). The INDEGREE is what
allows us to know when we can peel these commits while still respecting
the priority we want at the end.
Thanks,
-Stolee