"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > When running 'git rev-list --topo-order' and its kin, the topo_order > setting in struct rev_info implies the limited setting. This means > that the following things happen during prepare_revision_walk(): > > * revs->limited implies we run limit_list() to walk the entire > reachable set. There are some short-cuts here, such as if we > perform a range query like 'git rev-list COMPARE..HEAD' and we > can stop limit_list() when all queued commits are uninteresting. And if revs->topo_order is set, then (with current implementation) we need limit_list() to run to generate commit_list with commits to be topologically sorted, which is done by setting revs->limited. In short, with current code revs->topo_order implies revs->limited. > > * revs->topo_order implies we run sort_in_topological_order(). See > the implementation of that method in commit.c. It implies that > the full set of commits to order is in the given commit_list. So the current code uses "generate list of commits, then sort it" approach... > > These two methods imply that a 'git rev-list --topo-order HEAD' > command must walk the entire reachable set of commits _twice_ before > returning a single result. > > If we have a commit-graph file with generation numbers computed, then > there is a better way. ...instead of generating commits in topological order as you go. > This patch introduces some necessary logic > redirection when we are in this situation. O.K., this should make main commit smaller. All right. > In v2.18.0, the commit-graph file contains zero-valued bytes in the > positions where the generation number is stored in v2.19.0 and later. > Thus, we use generation_numbers_enabled() to check if the commit-graph > is available and has non-zero generation numbers. > > When setting revs->limited only because revs->topo_order is true, > only do so if generation numbers are not available. There is no > reason to use the new logic as it will behave similarly when all > generation numbers are INFINITY or ZERO. O.K. we will be using new algorithm only when there actually are some generation numbers. > In prepare_revision_walk(), if we have revs->topo_order but not > revs->limited, then we trigger the new logic. It breaks the logic > into three pieces, to fit with the existing framework: So if revs->limited is set (but not because revs->topo_order is set), which means A..B queries, we will be still using the old algorithm. All right, though I wonder if it could be improved in the future (perhaps with the help of other graph labelling / indices than generation numbers, maybe a positive-cut index). Do you have an idea why there is no improvement with the new code in this case? > 1. init_topo_walk() fills a new struct topo_walk_info in the rev_info > struct. We use the presence of this struct as a signal to use the > new methods during our walk. In this patch, this method simply > calls limit_list() and sort_in_topological_order(). In the future, > this method will set up a new data structure to perform that logic > in-line. > > 2. next_topo_commit() provides get_revision_1() with the next topo- > ordered commit in the list. Currently, this simply pops the commit > from revs->commits. > > 3. expand_topo_walk() provides get_revision_1() with a way to signal > walking beyond the latest commit. Currently, this calls > add_parents_to_list() exactly like the old logic. So all three new functions should perform exactly like the old logic, isn't it? > While this commit presents method redirection for performing the > exact same logic as before, it allows the next commit to focus only > on the new logic. All right, it's logical. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > revision.c | 42 ++++++++++++++++++++++++++++++++++++++---- > revision.h | 4 ++++ > 2 files changed, 42 insertions(+), 4 deletions(-) > > diff --git a/revision.c b/revision.c > index e18bd530e4..2dcde8a8ac 100644 > --- a/revision.c > +++ b/revision.c > @@ -25,6 +25,7 @@ > #include "worktree.h" > #include "argv-array.h" > #include "commit-reach.h" > +#include "commit-graph.h" > > volatile show_early_output_fn_t show_early_output; > > @@ -2454,7 +2455,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s > if (revs->diffopt.objfind) > revs->simplify_history = 0; > > - if (revs->topo_order) > + if (revs->topo_order && !generation_numbers_enabled(the_repository)) > revs->limited = 1; All right, with --topo-order and existing generation numbers don't force the revs->limited code (i.e. explicit not wrapped use of limit_list()). So with --topo-order and A..B, we have revs->limited set, with --topo-order and no generation numbers we have revs->limited set. > > if (revs->prune_data.nr) { > @@ -2892,6 +2893,33 @@ static int mark_uninteresting(const struct object_id *oid, > return 0; > } > > +struct topo_walk_info {}; Nice trick with using NULL-ness of the pointer to the currently empty struct as a boolean flag denoting whether to use new generation number using algorithm for topological sorting. > + > +static void init_topo_walk(struct rev_info *revs) > +{ > + struct topo_walk_info *info; I guess this helper variables is here for next revisions, as we could have made without it... > + revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info)); > + info = revs->topo_walk_info; > + memset(info, 0, sizeof(struct topo_walk_info)); ...by using + memset(revs->topo_walk_info, 0, sizeof(struct topo_walk_info)); > + > + limit_list(revs); > + sort_in_topological_order(&revs->commits, revs->sort_order); This is not exactly identical to the old code, which has if (limit_list(revs) < 0) return -1; if (revs->topo_order) sort_in_topological_order(&revs->commits, revs->sort_order); We know that init_topo_walk() would be invoked, as the name implies, only when revs->topo_order is set, but do we know that limit_list() would not return an error? > +} > + > +static struct commit *next_topo_commit(struct rev_info *revs) > +{ > + return pop_commit(&revs->commits); > +} All right, identical to the old code. > + > +static void expand_topo_walk(struct rev_info *revs, struct commit *commit) > +{ > + if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) { > + if (!revs->ignore_missing_links) > + die("Failed to traverse parents of commit %s", > + oid_to_hex(&commit->object.oid)); > + } > +} All right, identical to the old code. While at it, should this message be marked up for translation, or is it something so low-level (and rare) that should be kept untranslated? But this would be better left for separate commit series, to not entangle this one with spurious changes. > + > int prepare_revision_walk(struct rev_info *revs) > { > int i; > @@ -2928,11 +2956,13 @@ int prepare_revision_walk(struct rev_info *revs) > commit_list_sort_by_date(&revs->commits); > if (revs->no_walk) > return 0; > - if (revs->limited) > + if (revs->limited) { > if (limit_list(revs) < 0) > return -1; > - if (revs->topo_order) > - sort_in_topological_order(&revs->commits, revs->sort_order); > + if (revs->topo_order) > + sort_in_topological_order(&revs->commits, revs->sort_order); > + } else if (revs->topo_order) > + init_topo_walk(revs); Previously when revs->topo_order was set, Git called sort_in_topological_order(), because revs->limited got always set to truthy value if revs->topo_order was true. Now running sort_in_topological_order() is done only if revs->limited is set (because of A..B); if it is not, init_topo_walk() is called. All right, identical to the old code, up to checking the return value of limit_list(), see previous comments. > if (revs->line_level_traverse) > line_log_filter(revs); > if (revs->simplify_merges) > @@ -3257,6 +3287,8 @@ static struct commit *get_revision_1(struct rev_info *revs) > > if (revs->reflog_info) > commit = next_reflog_entry(revs->reflog_info); > + else if (revs->topo_walk_info) > + commit = next_topo_commit(revs); > else > commit = pop_commit(&revs->commits); All right, identical to the old code. > @@ -3278,6 +3310,8 @@ static struct commit *get_revision_1(struct rev_info *revs) > > if (revs->reflog_info) > try_to_simplify_commit(revs, commit); > + else if (revs->topo_walk_info) > + expand_topo_walk(revs, commit); > else if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) { > if (!revs->ignore_missing_links) > die("Failed to traverse parents of commit %s", All right, identical to the old code. > diff --git a/revision.h b/revision.h > index 2b30ac270d..fd4154ff75 100644 > --- a/revision.h > +++ b/revision.h > @@ -56,6 +56,8 @@ struct rev_cmdline_info { > #define REVISION_WALK_NO_WALK_SORTED 1 > #define REVISION_WALK_NO_WALK_UNSORTED 2 > > +struct topo_walk_info; > + > struct rev_info { > /* Starting list */ > struct commit_list *commits; > @@ -245,6 +247,8 @@ struct rev_info { > const char *break_bar; > > struct revision_sources *sources; > + > + struct topo_walk_info *topo_walk_info; > }; > > int ref_excluded(struct string_list *, const char *path);