On 10/12/2018 2:33 AM, Junio C Hamano wrote:
"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
* revs->limited implies we run limit_list() to walk the entire
reachable set. There are some short-cuts here, such as if we
perform a range query like 'git rev-list COMPARE..HEAD' and we
can stop limit_list() when all queued commits are uninteresting.
* revs->topo_order implies we run sort_in_topological_order(). See
the implementation of that method in commit.c. It implies that
the full set of commits to order is in the given commit_list.
These two methods imply that a 'git rev-list --topo-order HEAD'
command must walk the entire reachable set of commits _twice_ before
returning a single result.
With or without "--topo-order", running rev-list without any
negative commit means we must dig down to the roots that can be
reached from the positive commits we have.
If we use default order in 'git log', we don't walk all the way to the
root commits, and instead trust the commit-date. (This is different than
--date-order, which does guarantee parents after children.) In this
case, revs->limited is false.
I am to sure if having to run the "sort" of order N counts as "walk
the entire reachable set once" (in addition to the enumeration that
must be done to prepare that N commits, performed in limit_list()).
sort_in_topological_order() does actually _two_ walks (the in-degree
computation plus the walk that peels commits of in-degree zero), but
those walks are cheaper because we've already parsed the commits in
limit_list().
3. expand_topo_walk() provides get_revision_1() with a way to signal
walking beyond the latest commit. Currently, this calls
add_parents_to_list() exactly like the old logic.
"latest"? We dig down the history from newer to older, so at some
point we hit an old commit and need to find the parents to keep
walking towards even older parts of the history. Did you mean
"earliest" instead?
I mean "latest" in terms of the algorithm, so "the commit that was
returned by get_revision_1() most recently". This could use some
rewriting for clarity.
@@ -2454,7 +2455,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
if (revs->diffopt.objfind)
revs->simplify_history = 0;
- if (revs->topo_order)
+ if (revs->topo_order && !generation_numbers_enabled(the_repository))
revs->limited = 1;
Are we expecting that this is always a bool? Can there be new
commits for which generation numbers are not computed and stored
while all the old, stable and packed commits have generation
numbers?
For this algorithm to work, we only care that _some_ commits have
generation numbers. We expect that if a commit-graph file exists with
generation numbers, then the majority of commits have generation
numbers. The commits that were added or fetched since the commit-graph
was written will have generation number INFINITY, but the topo-order
algorithm will still work and be efficient in those cases. (This is also
why we have the "half graph" case in test_three_modes.)
@@ -2892,6 +2893,33 @@ static int mark_uninteresting(const struct object_id *oid,
return 0;
}
+struct topo_walk_info {};
+
+static void init_topo_walk(struct rev_info *revs)
+{
+ struct topo_walk_info *info;
+ revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
+ info = revs->topo_walk_info;
+ memset(info, 0, sizeof(struct topo_walk_info));
There is no member in the struct at this point. Are we sure this is
safe? Just being curious. I know xmalloc() gives us at least one
byte and info won't be NULL. I just do not know offhand if we have
a guarantee that memset() acts sensibly to fill the first 0 bytes.
This is a good question. It seems to work for me when I check out your
version of this commit (6c04ff30 "revision.c: begin refactoring
--topo-order logic") and run all tests.
+ limit_list(revs);
+ sort_in_topological_order(&revs->commits, revs->sort_order);
+}
+
+static struct commit *next_topo_commit(struct rev_info *revs)
+{
+ return pop_commit(&revs->commits);
+}
+
+static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
+{
+ if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
+ if (!revs->ignore_missing_links)
+ die("Failed to traverse parents of commit %s",
+ oid_to_hex(&commit->object.oid));
+ }
+}
+
int prepare_revision_walk(struct rev_info *revs)
{
int i;
@@ -2928,11 +2956,13 @@ int prepare_revision_walk(struct rev_info *revs)
commit_list_sort_by_date(&revs->commits);
if (revs->no_walk)
return 0;
- if (revs->limited)
+ if (revs->limited) {
if (limit_list(revs) < 0)
return -1;
- if (revs->topo_order)
- sort_in_topological_order(&revs->commits, revs->sort_order);
+ if (revs->topo_order)
+ sort_in_topological_order(&revs->commits, revs->sort_order);
+ } else if (revs->topo_order)
+ init_topo_walk(revs);
if (revs->line_level_traverse)
line_log_filter(revs);
if (revs->simplify_merges)
The diff is a bit hard to grok around here, but
- when limited *and* topo_order, we do the sort here, as we know we
already have called limit_list(), i.e. we behave identically as
the code before this patch in that case.
- when not limited but topo_order, then we do init_topo_walk();
currently we do limit_list() and sort_in_topological_order(),
which means we do the same as above.
As long as limit_list() and sort_in_topological_order() does not
look at revs->limited bit, this patch cannot cause any regression.
@@ -3257,6 +3287,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
if (revs->reflog_info)
commit = next_reflog_entry(revs->reflog_info);
+ else if (revs->topo_walk_info)
+ commit = next_topo_commit(revs);
else
commit = pop_commit(&revs->commits);
So this get_revision_1() always grabs the commit from next_topo_commit()
when topo-order is in effect.
And specifically, when the conditions for our new topo-walk algorithm
are in effect. If the commit-graph doesn't exist, the old logic will
still go through for "git log --topo-order".
Thanks for the careful look!
-Stolee