Re: [PATCH v2] remote.c: avoid O(m*n) behavior in match_push_refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 08, 2013 at 12:02:11AM -0700, Brandon Casey wrote:

> Here is the reroll with an updated commit message that hopefully
> provides a little more detail to justify this change.  I removed
> the use of the search index in the send_prune block since I think
> that pruning many refs is an uncommon operation and the overhead
> of creating the index will more commonly exceed the benefit of
> using it.

I don't know. I'd think that if you are using pruning, you might delete
a large chunk at one time (e.g., rearranging your ref hierarchy,
followed by "git push --mirror"). But that is just my gut feeling. I
haven't actually run into this slow-down in the real world (we typically
fetch from our giant repositories rather than push into them).

> This version now lazily builds the search index in the first loop,
> so there should be no impact when pushing using explicit refspecs.
> 
> e.g. pushing a change for review to Gerrit
> 
>    $ git push origin HEAD:refs/for/master
> 
> I suspect that this is the most common form of pushing and furthermore
> will become the default once push.default defaults to 'current'.

Nice.

> The remaining push cases can be distilled into the following:
> 
>   ref-count    impact
>   m >= log n   improved with this patch
>   m < log n    regressed with this patch roughly ~6-7%
> 
> So, I think what we have to consider is whether the improvement to
> something like 'git push --mirror' is worth the impact to an asymmetric
> push where the number of local refs is much smaller than the number of
> remote refs.  I'm not sure how common the latter really is though.
> Gerrit does produce repositories with many refs on the remote end in
> the refs/changes/ namespace, but do people commonly push to Gerrit
> using matching or pattern refspecs?  Not sure, but I'd tend to think
> that they don't.

To me it is not about what happens sometimes or not, but about having
runaway worst-case behavior that is unusable. The 6-7% increase (which
is the absolute worst-case measurement we could come up with; in the
real world you would usually transfer actual objects, and connect over
an actual network) is worth it, IMHO.

So I'd be in favor of applying this (possibly covering the send_prune
case, too). If somebody really wants to care about the 6-7%, they can
build on top of your patch with heuristics to avoid indexing in the
small cases.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]