Re: [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee <dstolee@xxxxxxxxxxxxx> wrote:
>
> The can_all_from_reach_with_flags() algorithm is currently quadratic in
> the worst case, because it calls the reachable() method for every 'from'
> without tracking which commits have already been walked or which can
> already reach a commit in 'to'.
>
> Rewrite the algorithm to walk each commit a constant number of times.
>
> We also add some optimizations that should work for the main consumer of
> this method: fetch negotitation (haves/wants).
>
> The first step includes using a depth-first-search (DFS) from each from
> commit, sorted by ascending generation number. We do not walk beyond the
> minimum generation number or the minimum commit date. This DFS is likely
> to be faster than the existing reachable() method because we expect
> previous ref values to be along the first-parent history.
>
> If we find a target commit, then we mark everything in the DFS stack as
> a RESULT. This expands the set of targets for the other from commits. We
> also mark the visited commits using 'assign_flag' to prevent re-walking
> the same code.
>
> We still need to clear our flags at the end, which is why we will have a
> total of three visits to each commit.
>
> Performance was measured on the Linux repository using
> 'test-tool reach can_all_from_reach'. The input included rows seeded by
> tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
> v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
> v3 releases and want all major v4 releases." The "large" case included
> X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
> tags to the set, which does not greatly increase the number of objects
> that are considered, but does increase the number of 'from' commits,
> demonstrating the quadratic nature of the previous code.
>
> Small Case
> ----------
>
> Before: 1.45 s
>  After: 0.34 s
>
> Large Case
> ----------
>
> Before: 5.83 s
>  After: 0.37 s
>
> Note how the time increases between the two cases in the two versions.
> The new code increases relative to the number of commits that need to be
> walked, but not directly relative to the number of 'from' commits.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  commit-reach.c | 122 ++++++++++++++++++++++++++++++-------------------
>  commit-reach.h |   3 +-
>  upload-pack.c  |   5 +-
>  3 files changed, 82 insertions(+), 48 deletions(-)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index 992ad5cdc7..8e24455d9f 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
>         return is_descendant_of(commit, list);
>  }
>
> -static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date)
> +static int compare_commits_by_gen(const void *_a, const void *_b)
>  {
> -       struct prio_queue work = { compare_commits_by_commit_date };
> +       const struct commit *a = (const struct commit *)_a;
> +       const struct commit *b = (const struct commit *)_b;
>
> -       prio_queue_put(&work, from);
> -       while (work.nr) {
> -               struct commit_list *list;
> -               struct commit *commit = prio_queue_get(&work);
> -
> -               if (commit->object.flags & with_flag) {
> -                       from->object.flags |= assign_flag;
> -                       break;
> -               }
> -               if (!commit->object.parsed)
> -                       parse_object(&commit->object.oid);
> -               if (commit->object.flags & TMP_MARK)
> -                       continue;
> -               commit->object.flags |= TMP_MARK;
> -               if (commit->date < min_commit_date)
> -                       continue;
> -               for (list = commit->parents; list; list = list->next) {
> -                       struct commit *parent = list->item;
> -                       if (!(parent->object.flags & TMP_MARK))
> -                               prio_queue_put(&work, parent);
> -               }
> -       }
> -       from->object.flags |= TMP_MARK;
> -       clear_commit_marks(from, TMP_MARK);
> -       clear_prio_queue(&work);
> -       return (from->object.flags & assign_flag);
> +       if (a->generation < b->generation)
> +               return -1;
> +       if (a->generation > b->generation)
> +               return 1;
> +       return 0;
>  }
>
>  int can_all_from_reach_with_flag(struct object_array *from,
>                                  int with_flag, int assign_flag,
> -                                time_t min_commit_date)
> +                                time_t min_commit_date,
> +                                uint32_t min_generation)
>  {
> +       struct commit **list = NULL;
>         int i;
> +       int result = 1;
>
> +       ALLOC_ARRAY(list, from->nr);
>         for (i = 0; i < from->nr; i++) {
> -               struct object *from_one = from->objects[i].item;
> +               list[i] = (struct commit *)from->objects[i].item;
>
> -               if (from_one->flags & assign_flag)
> -                       continue;
> -               from_one = deref_tag(from_one, "a from object", 0);
> -               if (!from_one || from_one->type != OBJ_COMMIT) {
> -                       /* no way to tell if this is reachable by
> -                        * looking at the ancestry chain alone, so
> -                        * leave a note to ourselves not to worry about
> -                        * this object anymore.
> -                        */
> -                       from->objects[i].item->flags |= assign_flag;
> -                       continue;
> -               }
> -               if (!reachable((struct commit *)from_one, with_flag,
> -                              assign_flag, min_commit_date))
> +               parse_commit(list[i]);
> +
> +               if (list[i]->generation < min_generation)
>                         return 0;
>         }
> -       return 1;
> +
> +       QSORT(list, from->nr, compare_commits_by_gen);
> +
> +       for (i = 0; i < from->nr; i++) {
> +               /* DFS from list[i] */
> +               struct commit_list *stack = NULL;
> +
> +               list[i]->object.flags |= assign_flag;
> +               commit_list_insert(list[i], &stack);
> +
> +               while (stack) {
> +                       struct commit_list *parent;
> +
> +                       if (stack->item->object.flags & with_flag) {
> +                               pop_commit(&stack);
> +                               continue;
> +                       }
> +
> +                       for (parent = stack->item->parents; parent; parent = parent->next) {
> +                               if (parent->item->object.flags & (with_flag | RESULT))
> +                                       stack->item->object.flags |= RESULT;
> +
> +                               if (!(parent->item->object.flags & assign_flag)) {
> +                                       parent->item->object.flags |= assign_flag;
> +
> +                                       parse_commit(parent->item);

We can use parse_commit here as it would print a warning and keep going?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux