Ok, so I had somebody actually ask me about '--dirstat', and as a result I ended up looking at how well the numbers it calculates really reflect the damage done to a file. And to my horror, it doesn't necessarily reflect the damage well at all! Now, dirstat just takes the same damage numbers that git uses to estimate similarity for renames, so if dirstat gets odd numbers, then that implies that file similarity will also be somewhat odd. So looking at _why_ the dirstat numbers were odd, I came up with this patch that seems to make much sense. What used to happen is that diffcore_count_changes() simply ignored any hashes in the destination that didn't match hashes in the source. EXCEPT if the source hash didn't exist at all, in which case it would count _one_ destination hash that happened to have the "next" hash value. This changes it so that - whenever it bypasses a destination hash (because it doesn't match a source), it counts the bytes associated with that as "literal added" - at the end (once we have used up all the source hashes), we do the same thing with the remaining destination hashes. - when hashes do match, and we use the difference in counts as a value, we also use up that destination hash entry (the 'd++' This _seems_ to make --dirstat output more sensible, and I'd hope that that in turn should mean that file rename detection should also be more sensible. But I haven't actually verified it in any way. Maybe I just screwed up file rename detection entirely. Did I miss something? Linus --- diffcore-delta.c | 11 ++++++++++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/diffcore-delta.c b/diffcore-delta.c index e670f85..7cf431d 100644 --- a/diffcore-delta.c +++ b/diffcore-delta.c @@ -201,10 +201,15 @@ int diffcore_count_changes(struct diff_filespec *src, while (d->cnt) { if (d->hashval >= s->hashval) break; + la += d->cnt; d++; } src_cnt = s->cnt; - dst_cnt = d->hashval == s->hashval ? d->cnt : 0; + dst_cnt = 0; + if (d->cnt && d->hashval == s->hashval) { + dst_cnt = d->cnt; + d++; + } if (src_cnt < dst_cnt) { la += dst_cnt - src_cnt; sc += src_cnt; @@ -213,6 +218,10 @@ int diffcore_count_changes(struct diff_filespec *src, sc += dst_cnt; s++; } + while (d->cnt) { + la += d->cnt; + d++; + } if (!src_count_p) free(src_count); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html