Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)

Jeff King <peff@xxxxxxxx> · Mon, 21 May 2012 15:41:14 -0400

On Mon, May 21, 2012 at 12:15:13PM -0600, Martin Fick wrote:

> Of course, we use Gerrit, so features tend to be called 
> changes and each change may get many revisions (patchsets), 
> so all of these get refs, but I think that it might be wrong 
> to consider that out of the ordinary anymore.  After all, 
> should a version control system such as git not support 100K 
> revisions of features developed independently on separate 
> branches (within Gerrit or not)?  100K is not really that 
> many when you consider a large project.  Even without 
> Gerrit, if someone wanted to track that many features 
> (likely over a few years), they will probably use up tons of 
> refs.  

I think the more compelling line of argument is not "is this
reasonable?", but rather that git has been designed from the ground-up
to be efficient, and these are not fundamental design issues with git at
all. They are just silly little spots where we used a quick-to-write
quadratic algorithm instead of something more complex with better
asymptotic behavior. And if we can fix these silly spots easily, then
there's no reason not to. It helps the small-N case a tiny bit, and it
makes the big-N case feasible.

So far, the only quadratic case I have seen that is not easy to fix is
replacing "struct commit_list" with a priority queue or similar.  But we
managed to hack around that long ago with fce87ae (Fix quadratic
performance in rewrite_one., 2008-07-12), and I don't think it's
generally a problem in practice.

Anyway, my point is that we don't even have to talk about "reasonable"
or "absurd". Git should be fast even on absurd cases, because 99% of the
work has already been done, and the last 1% is easy.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html