Wrong damage counting in diffcore_count_changes?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, so I had somebody actually ask me about '--dirstat', and as a result I 
ended up looking at how well the numbers it calculates really reflect the 
damage done to a file.

And to my horror, it doesn't necessarily reflect the damage well at all!

Now, dirstat just takes the same damage numbers that git uses to estimate 
similarity for renames, so if dirstat gets odd numbers, then that implies 
that file similarity will also be somewhat odd.

So looking at _why_ the dirstat numbers were odd, I came up with this 
patch that seems to make much sense. What used to happen is that 
diffcore_count_changes() simply ignored any hashes in the destination that 
didn't match hashes in the source. EXCEPT if the source hash didn't exist 
at all, in which case it would count _one_ destination hash that happened 
to have the "next" hash value. 

This changes it so that

 - whenever it bypasses a destination hash (because it doesn't match a 
   source), it counts the bytes associated with that as "literal added"

 - at the end (once we have used up all the source hashes), we do the same 
   thing with the remaining destination hashes.

 - when hashes do match, and we use the difference in counts as a value, 
   we also use up that destination hash entry (the 'd++'

This _seems_ to make --dirstat output more sensible, and I'd hope that 
that in turn should mean that file rename detection should also be more 
sensible. But I haven't actually verified it in any way. Maybe I just 
screwed up file rename detection entirely.

Did I miss something?

		Linus
---
 diffcore-delta.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/diffcore-delta.c b/diffcore-delta.c
index e670f85..7cf431d 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -201,10 +201,15 @@ int diffcore_count_changes(struct diff_filespec *src,
 		while (d->cnt) {
 			if (d->hashval >= s->hashval)
 				break;
+			la += d->cnt;
 			d++;
 		}
 		src_cnt = s->cnt;
-		dst_cnt = d->hashval == s->hashval ? d->cnt : 0;
+		dst_cnt = 0;
+		if (d->cnt && d->hashval == s->hashval) {
+			dst_cnt = d->cnt;
+			d++;
+		}
 		if (src_cnt < dst_cnt) {
 			la += dst_cnt - src_cnt;
 			sc += src_cnt;
@@ -213,6 +218,10 @@ int diffcore_count_changes(struct diff_filespec *src,
 			sc += dst_cnt;
 		s++;
 	}
+	while (d->cnt) {
+		la += d->cnt;
+		d++;
+	}
 
 	if (!src_count_p)
 		free(src_count);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]