Re: [PATCH 3/3] fetch-pack: avoid quadratic behavior in rev_list_push

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 2, 2013 at 2:24 AM, Jeff King <peff@xxxxxxxx> wrote:
> When we call find_common to start finding common ancestors
> with the remote side of a fetch, the first thing we do is
> insert the tip of each ref into our rev_list linked list. We
> keep the list sorted the whole time with
> commit_list_insert_by_date, which means our insertion ends
> up doing O(n^2) timestamp comparisons.
>
> We could teach rev_list_push to use an unsorted list, and
> then sort it once after we have added each ref. However, in
> get_rev, we process the list by popping commits off the
> front and adding parents back in timestamp-sorted order. So
> that procedure would still operate on the large list.
>
> Instead, we can replace the linked list with a heap-based
> priority queue, which can do O(log n) insertion, making the
> whole insertion procedure O(n log n).
>
> As a result of switching to the prio_queue struct, we fix
> two minor bugs:
>
>   1. When we "pop" a commit in get_rev, and when we clear
>      the rev_list in find_common, we do not take care to
>      free the "struct commit_list", and just leak its
>      memory. With the prio_queue implementation, the memory
>      management is handled for us.
>
>   2. In get_rev, we look at the head commit of the list,
>      possibly push its parents onto the list, and then "pop"
>      the front of the list off, assuming it is the same
>      element that we just peeked at. This is typically going
>      to be the case, but would not be in the face of clock
>      skew: the parents are inserted by date, and could
>      potentailly be inserted at the head of the list if they

s/potentailly/potentially/

>      have a timestamp newer than their descendent. In this
>      case, we would accidentally pop the parent, and never
>      process it at all.
>
>      The new implementation pulls the commit off of the
>      queue as we examine it, and so does not suffer from
>      this problem.
>
> With this patch, a fetch of a single commit into a
> repository with 50,000 refs went from:
>
>   real    0m7.984s
>   user    0m7.852s
>   sys     0m0.120s
>
> to:
>
>   real    0m2.017s
>   user    0m1.884s
>   sys     0m0.124s
>
> Before this patch, a larger case with 370K refs still had
> not completed after tens of minutes; with this patch, it
> completes in about 12 seconds.
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]