Re: Fix a pathological case in git detecting proper renames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 29 Nov 2007, Jeff King wrote:
> 
> I think it will get worse, because you are simultaneously calculating
> all of the similarity scores bit by bit rather than doing a loop. Though
> perhaps you mean at the end you will end up with a list of src/dst pairs
> sorted by score, and you can loop over that.

Well, after thinking about this a bit, I think there's a solution that may 
work well with the current thing too: instead of looping just *once* over 
the list of rename pairs, loop twice - and simply refuse to do copies on 
the first loop.

This trivial patch does that, and turns Kumar's test-case into a perfect 
rename list.

It's not pretty, it's not smart, but it seems to work. There's something 
to be said for keeping it simple and stupid.

And it should not be nearly as expensive as it may _look_. Yes, the loop 
is "(i = 0; i < num_create * num_src; i++)", but the important part is 
that the whole array is sorted by rename score, and we have a

	if (mx[i].score < minimum_score)
		break;

in it, so uthe loop actually would tend to terminate rather quickly.

Anyway, Kumar, the thing to take away from this is:

 - git really doesn't even *care* about the whole "rename detection" 
   internally, and any commits you have done with renames are totally 
   independent of the heuristics we then use to *show* the renames.

 - the rename detection really is for just two reasons: (a) keep humans 
   happy, and keep the diffs small and (b) help automatic merging across 
   renames. So getting renames right is certainly good, but it's more of a 
   "politeness" issue than a "correctness" issue, although the merge 
   portion of it does matter a lot sometimes.

 - the important thing here is that you can commit your changes and not 
   worry about them being somehow "corrupted" by lack of rename detection, 
   even if you commit them with a version of git that doesn't do rename 
   detection the way you expected it. The rename detection is an 
   "after-the-fact" thing, not something that actually gets saved in the 
   repository, which is why we can change the heuristics _after_ seeing 
   examples, and the examples magically correct themselves!

 - try out the two patches I've posted, and see if they work for you. They 
   pass the test-suite, and the output for your example commit looks sane, 
   but hey, if you have other test-cases, try them out.

Here's Kumar's pretty diffstat with both my patches:

	 Makefile                                         |    6 +++---
	 board/{cds => freescale}/common/cadmus.c         |    0
	 board/{cds => freescale}/common/cadmus.h         |    0
	 board/{cds => freescale}/common/eeprom.c         |    0
	 board/{cds => freescale}/common/eeprom.h         |    0
	 board/{cds => freescale}/common/ft_board.c       |    0
	 board/{cds => freescale}/common/via.c            |    0
	 board/{cds => freescale}/common/via.h            |    0
	 board/{cds => freescale}/mpc8541cds/Makefile     |    0
	 board/{cds => freescale}/mpc8541cds/config.mk    |    0
	 board/{cds => freescale}/mpc8541cds/init.S       |    0
	 board/{cds => freescale}/mpc8541cds/mpc8541cds.c |    0
	 board/{cds => freescale}/mpc8541cds/u-boot.lds   |    4 ++--
	 board/{cds => freescale}/mpc8548cds/Makefile     |    0
	 board/{cds => freescale}/mpc8548cds/config.mk    |    0
	 board/{cds => freescale}/mpc8548cds/init.S       |    0
	 board/{cds => freescale}/mpc8548cds/mpc8548cds.c |    0
	 board/{cds => freescale}/mpc8548cds/u-boot.lds   |    4 ++--
	 board/{cds => freescale}/mpc8555cds/Makefile     |    0
	 board/{cds => freescale}/mpc8555cds/config.mk    |    0
	 board/{cds => freescale}/mpc8555cds/init.S       |    0
	 board/{cds => freescale}/mpc8555cds/mpc8555cds.c |    0
	 board/{cds => freescale}/mpc8555cds/u-boot.lds   |    4 ++--
	 23 files changed, 9 insertions(+), 9 deletions(-)

and here it is before:

	 Makefile                                           |    6 +-
	 board/cds/mpc8548cds/Makefile                      |   60 -----
	 board/cds/mpc8555cds/Makefile                      |   60 -----
	 board/cds/mpc8555cds/init.S                        |  255 --------------------
	 board/cds/mpc8555cds/u-boot.lds                    |  150 ------------
	 board/{cds => freescale}/common/cadmus.c           |    0
	 board/{cds => freescale}/common/cadmus.h           |    0
	 board/{cds => freescale}/common/eeprom.c           |    0
	 board/{cds => freescale}/common/eeprom.h           |    0
	 board/{cds => freescale}/common/ft_board.c         |    0
	 board/{cds => freescale}/common/via.c              |    0
	 board/{cds => freescale}/common/via.h              |    0
	 board/{cds => freescale}/mpc8541cds/Makefile       |    0
	 board/{cds => freescale}/mpc8541cds/config.mk      |    0
	 board/{cds => freescale}/mpc8541cds/init.S         |    0
	 board/{cds => freescale}/mpc8541cds/mpc8541cds.c   |    0
	 board/{cds => freescale}/mpc8541cds/u-boot.lds     |    4 +-
	 .../mpc8541cds => freescale/mpc8548cds}/Makefile   |    0
	 board/{cds => freescale}/mpc8548cds/config.mk      |    0
	 board/{cds => freescale}/mpc8548cds/init.S         |    0
	 board/{cds => freescale}/mpc8548cds/mpc8548cds.c   |    0
	 board/{cds => freescale}/mpc8548cds/u-boot.lds     |    4 +-
	 .../mpc8541cds => freescale/mpc8555cds}/Makefile   |    0
	 board/{cds => freescale}/mpc8555cds/config.mk      |    0
	 .../mpc8541cds => freescale/mpc8555cds}/init.S     |    0
	 board/{cds => freescale}/mpc8555cds/mpc8555cds.c   |    0
	 .../mpc8541cds => freescale/mpc8555cds}/u-boot.lds |    4 +-
	 27 files changed, 9 insertions(+), 534 deletions(-)

so it certainly makes the diffs prettier.

		Linus

---
 diffcore-rename.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index f64294e..3d37725 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -497,6 +497,19 @@ void diffcore_rename(struct diff_options *options)
 	qsort(mx, num_create * num_src, sizeof(*mx), score_compare);
 	for (i = 0; i < num_create * num_src; i++) {
 		struct diff_rename_dst *dst = &rename_dst[mx[i].dst];
+		struct diff_filespec *src;
+		if (dst->pair)
+			continue; /* already done, either exact or fuzzy. */
+		if (mx[i].score < minimum_score)
+			break; /* there is no more usable pair. */
+		src = rename_src[mx[i].src].one;
+		if (src->rename_used)
+			continue;
+		record_rename_pair(mx[i].dst, mx[i].src, mx[i].score);
+		rename_count++;
+	}
+	for (i = 0; i < num_create * num_src; i++) {
+		struct diff_rename_dst *dst = &rename_dst[mx[i].dst];
 		if (dst->pair)
 			continue; /* already done, either exact or fuzzy. */
 		if (mx[i].score < minimum_score)
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux