On Fri, Feb 18, 2011 at 3:27 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > We've had stupid bugs in the "diffcore_count_changes()" logic before. > It's just that they're _usually_ hidden by overwhelming common code. > > In fact, the attached patch improves things a bit. There's a secondary problem too, which is illustrated by this: [torvalds@i5 etherpad]$ git diff --summary -M30 $BASE.. | grep m0024_statistics_table.js copy {trunk/etherpad => etherpad}/src/etherpad/db_migrations/m0024_statistics_table.js (100%) rename trunk/etherpad/src/etherpad/db_migrations/m0024_statistics_table.js => etherpad/src/etherpad/db_migrations/m0040_create_plugin_tables.js (52%) which also ends up terminally confusing the merge. It sees that dual source of m0024_statistics_table.js and just gets really really confused. And the bug is that we didn't even ask for copy detection! This just confuses merging more. Attached is the ugliest patch ever. I'm in no way implying this should ever be accepted, with - that crazy-ugly static variable to pass in the copy state to the 'for_each_hash()' callback - that really ugly "if (detect_rename == DIFF_DETECT_COPY)" with broken indentation just to get rid of the copy-checking phase. So please consider the attached patch just a "look, guys, this is wrong, and here's the ugliest hack you've ever seen to fix it". Anyway, with this, I can at least do git merge -Xrename-threshold=30 origin/pg and while it fails miserably, the failures are now no longer "totally obviously a git bug". Now it has real rename-rename conflicts like CONFLICT (rename/rename): Rename "trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml" in branch "HEAD" rename "trunk/etherpad/src/static/crossdomain.xml"->"etherpad/src/static/crossdomain.xml.in" in "origin/pg" which really _is_ a conflict that needs user input. Now, I didn't check that they are *all* of this valid kind, but most of them really are. The directories are: (HEAD): etherpad/src/themes/default/templates (origin/pg): etherpad/src/templates so you'd need to fix that up. Whatever. It's still a nasty merge, but at least git seems to do a much better job. That "-Xrename-threshold=30" thing is a total hack, but it's a valid way to say "ok, git didn't find some renames I want it to, so let's see if I can force it to do a less critical search". Linus
diffcore-rename.c | 12 +++++++++--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index df41be5..daa85e4 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -170,7 +170,7 @@ static int estimate_similarity(struct diff_filespec *src, * and the final score computation below would not have a * divide-by-zero issue. */ - if (base_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE) + if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE) return 0; if (!src->cnt_data && diff_populate_filespec(src, 0)) @@ -246,6 +246,8 @@ struct file_similarity { struct file_similarity *next; }; +static int find_copies_too; + static int find_identical_files(struct file_similarity *src, struct file_similarity *dst) { @@ -277,6 +279,8 @@ static int find_identical_files(struct file_similarity *src, } /* Give higher scores to sources that haven't been used already */ score = !source->rename_used; + if (source->rename_used && !find_copies_too) + continue; score += basename_same(source, target); if (score > best_score) { best = p; @@ -377,11 +381,12 @@ static void insert_file_table(struct hash_table *table, int src_dst, int index, * and then during the second round we try to match * cache-dirty entries as well. */ -static int find_exact_renames(void) +static int find_exact_renames(int copies) { int i; struct hash_table file_table; + find_copies_too = copies; init_hash(&file_table); for (i = 0; i < rename_src_nr; i++) insert_file_table(&file_table, -1, i, rename_src[i].one); @@ -467,7 +472,7 @@ void diffcore_rename(struct diff_options *options) * We really want to cull the candidates list early * with cheap tests in order to avoid doing deltas. */ - rename_count = find_exact_renames(); + rename_count = find_exact_renames(detect_rename == DIFF_DETECT_COPY); /* Did we only want exact renames? */ if (minimum_score == MAX_SCORE) @@ -551,6 +556,7 @@ void diffcore_rename(struct diff_options *options) rename_count++; } + if (detect_rename == DIFF_DETECT_COPY) for (i = 0; i < dst_cnt * NUM_CANDIDATE_PER_DST; i++) { struct diff_rename_dst *dst;