Re: [PATCH] remote.c: avoid O(n^2) behavior in match_push_refs by using string_list

Brandon Casey <bcasey@xxxxxxxxxx> · Wed, 3 Jul 2013 13:05:09 -0700

On 7/3/2013 12:00 PM, Jeff King wrote:
> On Wed, Jul 03, 2013 at 11:40:12AM -0700, Junio C Hamano wrote:
> 
>> Brandon Casey <drafnel@xxxxxxxxx> writes:
>>
>>> Right.  For repos with few refs on either side, I don't think there
>>> will be any measurable difference.  When pushing a single ref to a
>>> repo with a very large number of refs, we will see a very small net
>>> loss for the time required to prepare the string list (which grows
>>> linearly with the number of remote refs).  After 2 or 3 refs, we
>>> should see a net gain.
>>>
>>> So we're really just improving our worst case performance here.
>>
>> ... by penalizing the common case by how much?  If it is not too
>> much, then this obviously would be a good change.
> 
> I don't think by much. If we have "m" local refs to push and "n" remote
> refs, right now we do O(m*n) work ("m" linear searches of the remote
> namespace). With Brandon's patch, we do O(n log n) to build the index,

Whoops, yes, n log n, not linear as I misspoke.

> plus O(m log n) for lookups.
> 
> So our break-even point is basically m = log n, and for m smaller than
> that, we do more work building the index. Your absolute biggest
> difference would be pushing a single ref to a repository with a very
> large number of refs.
> 
> Here are the timings before and after Brandon's patch for pushing a
> no-op single ref from a normal repo to one with 370K refs (the same
> pathological repo from the upload-pack tests). Times are
> best-of-five.
> 
>              before     after
>      real    0m1.087s   0m1.156s
>      user    0m1.344s   0m1.412s
>      sys     0m0.288s   0m0.284s
> 
> So it's measurable, but even on a pathological worst-case, we're talking
> about 6% slowdown.

That agrees with what I've observed.

> You could try to guess about when to build the index based on the size
> of "m" and "n", but I suspect you'd waste more time calculating whether
> to build the index than you would simply building it in most cases.

I agree, I don't think it's worth trying to guess when to build an index
and when to just perform linear searches.  If building the payload for
each element in the index was more expensive than just assigning to a
pointer, than it could be worth it, but we're not, so I don't think it
is worth it.

-Brandon

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html