Re: Merging limitations after directory renames -- interesting test repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 18, 2011 at 3:27 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> We've had stupid bugs in the "diffcore_count_changes()" logic before.
> It's just that they're _usually_ hidden by overwhelming common code.
>
> In fact, the attached patch improves things a bit.

There's a secondary problem too, which is illustrated by this:

  [torvalds@i5 etherpad]$ git diff --summary -M30 $BASE.. | grep
m0024_statistics_table.js
   copy {trunk/etherpad =>
etherpad}/src/etherpad/db_migrations/m0024_statistics_table.js (100%)
   rename trunk/etherpad/src/etherpad/db_migrations/m0024_statistics_table.js
=> etherpad/src/etherpad/db_migrations/m0040_create_plugin_tables.js
(52%)

which also ends up terminally confusing the merge. It sees that dual
source of m0024_statistics_table.js and just gets really really
confused.

And the bug is that we didn't even ask for copy detection! This just
confuses merging more.

Attached is the ugliest patch ever. I'm in no way implying this should
ever be accepted, with

 - that crazy-ugly static variable to pass in the copy state to the
'for_each_hash()' callback

 - that really ugly "if (detect_rename == DIFF_DETECT_COPY)" with
broken indentation just to get rid of the copy-checking phase.

So please consider the attached patch just a "look, guys, this is
wrong, and here's the ugliest hack you've ever seen to fix it".

Anyway, with this, I can at least do

   git merge -Xrename-threshold=30 origin/pg

and while it fails miserably, the failures are now no longer "totally
obviously a git bug". Now it has real rename-rename conflicts like

CONFLICT (rename/rename): Rename
"trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml"
in branch "HEAD"
                          rename
"trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml.in"
in "origin/pg"

which really _is_ a conflict that needs user input.

Now, I didn't check that they are *all* of this valid kind, but most
of them really are. The directories are:
  (HEAD): etherpad/src/themes/default/templates
 (origin/pg): etherpad/src/templates

so you'd need to fix that up.

Whatever. It's still a nasty merge, but at least git seems to do a
much better job. That "-Xrename-threshold=30" thing is a total hack,
but it's a valid way to say "ok, git didn't find some renames I want
it to, so let's see if I can force it to do a less critical search".

                          Linus
 diffcore-rename.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..daa85e4 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * and the final score computation below would not have a
 	 * divide-by-zero issue.
 	 */
-	if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
+	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
 	if (!src->cnt_data && diff_populate_filespec(src, 0))
@@ -246,6 +246,8 @@ struct file_similarity {
 	struct file_similarity *next;
 };
 
+static int find_copies_too;
+
 static int find_identical_files(struct file_similarity *src,
 				struct file_similarity *dst)
 {
@@ -277,6 +279,8 @@ static int find_identical_files(struct file_similarity *src,
 			}
 			/* Give higher scores to sources that haven't been used already */
 			score = !source->rename_used;
+			if (source->rename_used && !find_copies_too)
+				continue;
 			score += basename_same(source, target);
 			if (score > best_score) {
 				best = p;
@@ -377,11 +381,12 @@ static void insert_file_table(struct hash_table *table, int src_dst, int index,
  * and then during the second round we try to match
  * cache-dirty entries as well.
  */
-static int find_exact_renames(void)
+static int find_exact_renames(int copies)
 {
 	int i;
 	struct hash_table file_table;
 
+	find_copies_too = copies;
 	init_hash(&file_table);
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
@@ -467,7 +472,7 @@ void diffcore_rename(struct diff_options *options)
 	 * We really want to cull the candidates list early
 	 * with cheap tests in order to avoid doing deltas.
 	 */
-	rename_count = find_exact_renames();
+	rename_count = find_exact_renames(detect_rename == DIFF_DETECT_COPY);
 
 	/* Did we only want exact renames? */
 	if (minimum_score == MAX_SCORE)
@@ -551,6 +556,7 @@ void diffcore_rename(struct diff_options *options)
 		rename_count++;
 	}
 
+	if (detect_rename == DIFF_DETECT_COPY)
 	for (i = 0; i < dst_cnt * NUM_CANDIDATE_PER_DST; i++) {
 		struct diff_rename_dst *dst;
 

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]