Re: Linux 2.6.24-rc6

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 21 Dec 2007 09:45:16 -0800 (PST)

On Thu, 20 Dec 2007, Linus Torvalds wrote:
> 
> Both answers are *correct*, though. The particular choice of "insert at 
> line 489, after line 488" is a bit odd, but is because we don't actually 
> search to exactly the beginning of where the differences started, we 
> search in blocks of 1kB and then we go forward to the next newline.

This slightly more involved diff does a better job at this particular 
issue. Whether the complexity is worth it or not, I dunno, but it changes 
the "remove common lines at the end" to do an exact job, which for this 
particular test-case means that the end result of adding a thousand lines 
of 'y' will look like

	[torvalds@woody ~]$ git diff -U0 a b | grep @@
	@@ -0,0 +1,1000 @@

instead - ie it will say that they were added at the very beginning of the 
file rather than added at some arbitrary point in the middle.

Whether this is really worth it, I dunno.

Also, I'm kind of debating with myself whether it would make most sense to 
only do this kind of optimization when (pick arbitrary cut-off here) 
something like more than half of the file is identical at the end. If we 
don't have a noticeable fraction of the file being the same, it may not 
make sense to really bother with this, since it really is meant for just 
things like ChangeLog files etc that have data added at the beginning.

That would make this whole optimization a lot more targeted to the case 
where it really matters and really helps.

I also do have an incling of a really evil way to make xdiff handle the 
case of having multiple lines of context right too, and basically just 
move all of this logic into xdiff itself rather than have this 
interface-level hack, but I'll have to let that idea brew for a while yet. 

			Linus

---
 xdiff-interface.c |   38 ++++++++++++++++++++++++++++++--------
 1 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 9ee877c..54a53d2 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -109,21 +109,43 @@ int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
  */
 static void trim_common_tail(mmfile_t *a, mmfile_t *b, long ctx)
 {
-	const int blk = 1024;
+	int blk = 1024;
 	long trimmed = 0, recovered = 0;
 	char *ap = a->ptr + a->size;
 	char *bp = b->ptr + b->size;
 	long smaller = (a->size < b->size) ? a->size : b->size;
 
-	while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {
-		trimmed += blk;
-		ap -= blk;
-		bp -= blk;
-	}
+	if (ctx)
+		return;
+
+	do {
+		while (blk + trimmed <= smaller && !memcmp(ap - blk, bp - blk, blk)) {
+			trimmed += blk;
+			ap -= blk;
+			bp -= blk;
+		}
+		blk /= 2;
+	} while (blk);
+
+	/* Did we trim one of them all away? */
+	if (trimmed == smaller) {
+		char *bigger;
+		if (a->size == b->size)
+			return;
+		bigger = a->ptr;
+		if (a->size > b->size)
+			bigger = b->ptr;
+
+		/* Did the other one end in a newline? */
+		if (bigger[trimmed-1] == '\n')
+			goto done;
+	}		
 
-	while (recovered < trimmed && 0 <= ctx)
+	/* Find the next newline */
+	while (recovered < trimmed)
 		if (ap[recovered++] == '\n')
-			ctx--;
+			break;
+done:
 	a->size -= (trimmed - recovered);
 	b->size -= (trimmed - recovered);
 }
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html