On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee <dstolee@xxxxxxxxxxxxx> wrote: > > The can_all_from_reach_with_flags() algorithm is currently quadratic in > the worst case, because it calls the reachable() method for every 'from' > without tracking which commits have already been walked or which can > already reach a commit in 'to'. > > Rewrite the algorithm to walk each commit a constant number of times. > > We also add some optimizations that should work for the main consumer of > this method: fetch negotitation (haves/wants). > > The first step includes using a depth-first-search (DFS) from each from > commit, sorted by ascending generation number. We do not walk beyond the > minimum generation number or the minimum commit date. This DFS is likely > to be faster than the existing reachable() method because we expect > previous ref values to be along the first-parent history. > > If we find a target commit, then we mark everything in the DFS stack as > a RESULT. This expands the set of targets for the other from commits. We > also mark the visited commits using 'assign_flag' to prevent re-walking > the same code. > > We still need to clear our flags at the end, which is why we will have a > total of three visits to each commit. > > Performance was measured on the Linux repository using > 'test-tool reach can_all_from_reach'. The input included rows seeded by > tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as > v3.[0-9]*. This mimics a (very large) fetch that says "I have all major > v3 releases and want all major v4 releases." The "large" case included > X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate > tags to the set, which does not greatly increase the number of objects > that are considered, but does increase the number of 'from' commits, > demonstrating the quadratic nature of the previous code. > > Small Case > ---------- > > Before: 1.45 s > After: 0.34 s > > Large Case > ---------- > > Before: 5.83 s > After: 0.37 s > > Note how the time increases between the two cases in the two versions. > The new code increases relative to the number of commits that need to be > walked, but not directly relative to the number of 'from' commits. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > commit-reach.c | 122 ++++++++++++++++++++++++++++++------------------- > commit-reach.h | 3 +- > upload-pack.c | 5 +- > 3 files changed, 82 insertions(+), 48 deletions(-) > > diff --git a/commit-reach.c b/commit-reach.c > index 992ad5cdc7..8e24455d9f 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct commit *commit, > return is_descendant_of(commit, list); > } > > -static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date) > +static int compare_commits_by_gen(const void *_a, const void *_b) > { > - struct prio_queue work = { compare_commits_by_commit_date }; > + const struct commit *a = (const struct commit *)_a; > + const struct commit *b = (const struct commit *)_b; > > - prio_queue_put(&work, from); > - while (work.nr) { > - struct commit_list *list; > - struct commit *commit = prio_queue_get(&work); > - > - if (commit->object.flags & with_flag) { > - from->object.flags |= assign_flag; > - break; > - } > - if (!commit->object.parsed) > - parse_object(&commit->object.oid); > - if (commit->object.flags & TMP_MARK) > - continue; > - commit->object.flags |= TMP_MARK; > - if (commit->date < min_commit_date) > - continue; > - for (list = commit->parents; list; list = list->next) { > - struct commit *parent = list->item; > - if (!(parent->object.flags & TMP_MARK)) > - prio_queue_put(&work, parent); > - } > - } > - from->object.flags |= TMP_MARK; > - clear_commit_marks(from, TMP_MARK); > - clear_prio_queue(&work); > - return (from->object.flags & assign_flag); > + if (a->generation < b->generation) > + return -1; > + if (a->generation > b->generation) > + return 1; > + return 0; > } > > int can_all_from_reach_with_flag(struct object_array *from, > int with_flag, int assign_flag, > - time_t min_commit_date) > + time_t min_commit_date, > + uint32_t min_generation) > { > + struct commit **list = NULL; > int i; > + int result = 1; > > + ALLOC_ARRAY(list, from->nr); > for (i = 0; i < from->nr; i++) { > - struct object *from_one = from->objects[i].item; > + list[i] = (struct commit *)from->objects[i].item; > > - if (from_one->flags & assign_flag) > - continue; > - from_one = deref_tag(from_one, "a from object", 0); > - if (!from_one || from_one->type != OBJ_COMMIT) { > - /* no way to tell if this is reachable by > - * looking at the ancestry chain alone, so > - * leave a note to ourselves not to worry about > - * this object anymore. > - */ > - from->objects[i].item->flags |= assign_flag; > - continue; > - } > - if (!reachable((struct commit *)from_one, with_flag, > - assign_flag, min_commit_date)) > + parse_commit(list[i]); > + > + if (list[i]->generation < min_generation) > return 0; > } > - return 1; > + > + QSORT(list, from->nr, compare_commits_by_gen); > + > + for (i = 0; i < from->nr; i++) { > + /* DFS from list[i] */ > + struct commit_list *stack = NULL; > + > + list[i]->object.flags |= assign_flag; > + commit_list_insert(list[i], &stack); > + > + while (stack) { > + struct commit_list *parent; > + > + if (stack->item->object.flags & with_flag) { > + pop_commit(&stack); > + continue; > + } > + > + for (parent = stack->item->parents; parent; parent = parent->next) { > + if (parent->item->object.flags & (with_flag | RESULT)) > + stack->item->object.flags |= RESULT; > + > + if (!(parent->item->object.flags & assign_flag)) { > + parent->item->object.flags |= assign_flag; > + > + parse_commit(parent->item); We can use parse_commit here as it would print a warning and keep going?