Re: Merging limitations after directory renames -- interesting test repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 18, 2011 at 03:27:36PM -0800, Linus Torvalds wrote:

> > Â1. Did you bump up your merge.renamelimit? It's hard to see because it
> > Â Â scrolls off the screen amidst the many conflicts, but the first
> > Â Â message is:
> >
> > Â Â Â warning: too many files (created: 425 deleted: 1093), skipping
> > Â Â Â inexact rename detection
> >
> > Â Â which you want to use. Try "git config merge.renamelimit
> > Â Â 10000". Which runs pretty snappily on my machine; I wonder if we
> > Â Â should up the default limit.
> 
> Yeah, for the kernel, I have
> 
> 	[diff]
> 		renamelimit=0
> 
> to disable the limit entirely, because the default limit is very low
> indeed. Git is quite good at the rename detection.
> 
> However, the reason for the low default is not because it's not snappy
> enough - it's because it can end up using a lot of memory (and if
> you're low on memory, the swapping will mean that it goes from "quite
> snappy" to "slow as molasses" - but it still will not be CPU limited,
> it's just paging like crazy).

I think it can be both. There is an O(n^2) part to the algorithm. I did
some timings a few years ago that showed an n^2 increase in time as you
bumped the limit:

  http://article.gmane.org/gmane.comp.version-control.git/73519

That's staying within a reasonable memory size. I would not be surprised
if you can get much worse behavior by going into swap, but I didn't
measure peak memory use there.

Those tests led to:

  commit 50705915eae89eae490dff30fa370ed02e4d6e72
  Author: Jeff King <peff@xxxxxxxx>
  Date:   Wed Apr 30 13:24:43 2008 -0400

    bump rename limit defaults

    The current rename limit default of 100 was arbitrarily
    chosen. Testing[1] has shown that on modern hardware, a
    limit of 200 adds about a second of computation time, and a
    limit of 500 adds about 5 seconds of computation time.

    This patch bumps the default limit to 200 for viewing diffs,
    and to 500 for performing a merge. The limit for generating
    git-status templates is set independently; we bump it up to
    200 here, as well, to match the diff limit.

But perhaps it's time to revisit the test; it's been 2 years, and my
hardware at the time was probably 2 years out of date. :)

Here are the old and new times for various sizes of rename. Details
about the test are in the message referenced above.

   N   Old CPU Seconds   New CPU Seconds
  10              0.43              0.02
 100              0.44              0.20
 200              1.40              0.55
 400              4.87              1.90
 800             18.08              7.01
1000             27.82             10.83

So maybe bump the diff limit to 400 and the merge limit to 1000,
doubling both? That leaves us at around 2 seconds per-commit for a log,
and 10 seconds tacked onto a merge. We could maybe even go higher with
the merge limit. If it's such a big merge, the conflict resolution is
probably going to take forever anyway, so 30 extra seconds if it makes
rename detection work is probably a good thing.

According to top, git only hit around 17M resident on the 1000-sized
one, so I don't think memory is a problem, at least for average repos
(and yes, I know top is an awful way to measure, but it's quick and it
would need to be orders of magnitude off for it to be a problem).

So I'm in favor of bumping the limits, or possibly even removing the
hard number limit and putting in a "try to do renames for this many
seconds" option. If we're going to have something like 30 second delays
on merge, though, we should perhaps write some eye candy to stderr after
2 seconds or so (like we do with "git checkout").

> So I do think we could try to lift the default a bit, but it might be
> even more important to just make the message much more noticeable and
> avoid scrolling past it. For example, setting a flag, and not printing
> out the message immediately, but instead print it out only if it turns
> into trouble at the end.

Yeah, I also think that would be useful. And if that information filters
up to the merge command, it can even give better advice (like how to
tweak the limit).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]