On Thu, 20 Dec 2007, Linus Torvalds wrote: > > Both answers are *correct*, though. The particular choice of "insert at > line 489, after line 488" is a bit odd, but is because we don't actually > search to exactly the beginning of where the differences started, we > search in blocks of 1kB and then we go forward to the next newline. This slightly more involved diff does a better job at this particular issue. Whether the complexity is worth it or not, I dunno, but it changes the "remove common lines at the end" to do an exact job, which for this particular test-case means that the end result of adding a thousand lines of 'y' will look like [torvalds@woody ~]$ git diff -U0 a b | grep @@ @@ -0,0 +1,1000 @@ instead - ie it will say that they were added at the very beginning of the file rather than added at some arbitrary point in the middle. Whether this is really worth it, I dunno. Also, I'm kind of debating with myself whether it would make most sense to only do this kind of optimization when (pick arbitrary cut-off here) something like more than half of the file is identical at the end. If we don't have a noticeable fraction of the file being the same, it may not make sense to really bother with this, since it really is meant for just things like ChangeLog files etc that have data added at the beginning. That would make this whole optimization a lot more targeted to the case where it really matters and really helps. I also do have an incling of a really evil way to make xdiff handle the case of having multiple lines of context right too, and basically just move all of this logic into xdiff itself rather than have this interface-level hack, but I'll have to let that idea brew for a while yet. Linus --- xdiff-interface.c | 38 ++++++++++++++++++++++++++++++-------- 1 files changed, 30 insertions(+), 8 deletions(-) diff --git a/xdiff-interface.c b/xdiff-interface.c index 9ee877c..54a53d2 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -109,21 +109,43 @@ int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf) */ static void trim_common_tail(mmfile_t *a, mmfile_t *b, long ctx) { - const int blk = 1024; + int blk = 1024; long trimmed = 0, recovered = 0; char *ap = a->ptr + a->size; char *bp = b->ptr + b->size; long smaller = (a->size < b->size) ? a->size : b->size; - while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) { - trimmed += blk; - ap -= blk; - bp -= blk; - } + if (ctx) + return; + + do { + while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) { + trimmed += blk; + ap -= blk; + bp -= blk; + } + blk /= 2; + } while (blk); + + /* Did we trim one of them all away? */ + if (trimmed == smaller) { + char *bigger; + if (a->size == b->size) + return; + bigger = a->ptr; + if (a->size > b->size) + bigger = b->ptr; + + /* Did the other one end in a newline? */ + if (bigger[trimmed-1] == '\n') + goto done; + } - while (recovered < trimmed && 0 <= ctx) + /* Find the next newline */ + while (recovered < trimmed) if (ap[recovered++] == '\n') - ctx--; + break; +done: a->size -= (trimmed - recovered); b->size -= (trimmed - recovered); } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html