Re: Merging limitations after directory renames -- interesting test repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 18, 2011 at 2:21 PM, Jeff King <peff@xxxxxxxx> wrote:
>
>  1. Did you bump up your merge.renamelimit? It's hard to see because it
>     scrolls off the screen amidst the many conflicts, but the first
>     message is:
>
>       warning: too many files (created: 425 deleted: 1093), skipping
>       inexact rename detection
>
>     which you want to use. Try "git config merge.renamelimit
>     10000". Which runs pretty snappily on my machine; I wonder if we
>     should up the default limit.

Yeah, for the kernel, I have

	[diff]
		renamelimit=0

to disable the limit entirely, because the default limit is very low
indeed. Git is quite good at the rename detection.

However, the reason for the low default is not because it's not snappy
enough - it's because it can end up using a lot of memory (and if
you're low on memory, the swapping will mean that it goes from "quite
snappy" to "slow as molasses" - but it still will not be CPU limited,
it's just paging like crazy).

That said, having the error message scroll by and not even be noticed
by somebody doing a merge is really not good. It's an important hint,
and if you miss that hint you can easily have a totally trivial merge
turn into a totally unacceptable one.

So I do think we could try to lift the default a bit, but it might be
even more important to just make the message much more noticeable and
avoid scrolling past it. For example, setting a flag, and not printing
out the message immediately, but instead print it out only if it turns
into trouble at the end.

I dunno.

>  2. Which version of git are you using? Commit 83c9031 produces some
>     funky results on merges. It's not in any released version of git,
>     but it is in master.

Ouch. I thought that commit wasn't _supposed_ to make any difference.
If it does, then that sounds like a bad bug. Junio?

> Hmm. That is probably because it was substantially rewritten:

Damn. We suck.

Lookie here, here's the problem (well, at least part of it):

  [torvalds@i5 etherpad]$ git diff --summary -M10 master...origin/pg |
grep rebuildjar
   rename {trunk/etherpad => etherpad}/bin/rebuildjar.sh (89%)

That looks fine. But how about doing it the other way around:

  [torvalds@i5 etherpad]$ git diff --summary -M10 origin/pg...master |
grep rebuildjar
   copy trunk/infrastructure/bin/comp.sh => etherpad/bin/rebuildjar.sh (13%)
   delete mode 100755 trunk/etherpad/bin/rebuildjar.sh

Yeah, not such a good match. And it's total crap too: that rename is
much worse than the real one. So we've actually done something
actively wrong. Because we have:

   .../etherpad => master:etherpad}/bin/rebuildjar.sh |   92
+++++++++++++++++++-
   1 files changed, 89 insertions(+), 3 deletions(-)

and

   .../comp.sh => master:etherpad/bin/rebuildjar.sh   |  296
+++++++++-----------
   1 files changed, 129 insertions(+), 167 deletions(-)

so that choice of comp.sh was total and utter crap, and the 13%
similarity comes from that.

So we did something wrong in picking that comp.sh file. And I bet it's
because on a very superficial level it looks better: despite the diff
being *obviously* bigger for the crazy comp.sh choice, I think it
decided that they had more lines in common.

We've had stupid bugs in the "diffcore_count_changes()" logic before.
It's just that they're _usually_ hidden by overwhelming common code.

In fact, the attached patch improves things a bit. At a huge added CPU
cost in rename detection. Then I get

  [torvalds@i5 etherpad]$ git diff --summary -M10 origin/pg...master |
grep rebuildjar
   rename {trunk/etherpad => etherpad}/bin/rebuildjar.sh (32%)

which is at least the sane choice for rename detection. It won't fix
really the merge-from-hell issue, though, since we won't consider "32%
similar" to be a rename.

We might want to have a way to force rename detection for certain
patterns for cases like this.

                                    Linus
 diffcore-rename.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..19957cd 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * and the final score computation below would not have a
 	 * divide-by-zero issue.
 	 */
-	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
+	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
 	if (!src->cnt_data && diff_populate_filespec(src, 0))

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]