[PATCH 3/4] xdiff/xhistogram: rely on xdl_trim_ends()

Tay Ray Chuan <rctay89@xxxxxxxxx> · Mon, 1 Aug 2011 12:20:09 +0800

Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend
set by xdl_trim_ends() that similarly tells us where the first unmatched
line from the start and end occurs.

Signed-off-by: Tay Ray Chuan <rctay89@xxxxxxxxx>
---

On Wed, Jul 13, 2011 at 3:56 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>[snip]
>> +     reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);
>
> What this does is logically not specific to histogram algorithm but can be
> applied to other backends, no? And I vaguely recall that Linus did try
> something like this once, found some issues with it when context is set to
> non zero, and stopped doing it (sorry, I do not have any more details).
>
> I am not suggesting you to remove this call or hoist the call to one level
> up to xdl_do_diff(), but I do have to wonder how much of the performance
> improvement you reported is due to this common head/tail reduction.

In a way, this patch is a response to Junio's email. It just made sense
to use existing functionality in xdiff (xdl_trim_ends() and xdf->dstart
and dend) over writing a new one (reduce_common_start_end()).

I believe Junio was referring to this patch by Linus:

  https://lkml.org/lkml/2007/12/20/692

That is an optimization on a more aggressive level - cutting out
content so that it doesn't get hashed in the first place. The
optimization used here (reduce_common_start_end()/xdl_trim_ends())
depends on the hashed result and simply reduces the "area" on which the
algorithm is applied to.

(Actually, I do have a working patch that does content trimming that is
context-length safe. But it's not specific to histogram so I'll keep it
with me till this series gets merged, lest it holds up the series.)

 xdiff/xhistogram.c |   31 ++++---------------------------
 1 files changed, 4 insertions(+), 27 deletions(-)

diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c
index 9cb69ea..804e19b 100644
--- a/xdiff/xhistogram.c
+++ b/xdiff/xhistogram.c
@@ -102,7 +102,7 @@ static int cmp_recs(xpparam_t const *xpp,
 	(cmp_recs(xpp, REC(env, s1, l1), REC(env, s2, l2)))
 
 #define CMP(i, s1, l1, s2, l2) \
-	(CMP_ENV(i->xpp, i->env, s1, l1, s2, l2))
+	(cmp_recs(i->xpp, REC(i->env, s1, l1), REC(i->env, s2, l2)))
 
 #define TABLE_HASH(index, side, line) \
 	XDL_HASHLONG((REC(index->env, side, line))->ha, index->table_bits)
@@ -248,23 +248,6 @@ static int find_lcs(struct histindex *index, struct region *lcs,
 	return index->has_common && index->max_chain_length < index->cnt;
 }
 
-static void reduce_common_start_end(xpparam_t const *xpp, xdfenv_t *env,
-	int *line1, int *count1, int *line2, int *count2)
-{
-	if (*count1 <= 1 || *count2 <= 1)
-		return;
-	while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, *line1, 2, *line2)) {
-		(*line1)++;
-		(*count1)--;
-		(*line2)++;
-		(*count2)--;
-	}
-	while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, LINE_END_PTR(1), 2, LINE_END_PTR(2))) {
-		(*count1)--;
-		(*count2)--;
-	}
-}
-
 static int fall_back_to_classic_diff(struct histindex *index,
 		int line1, int count1, int line2, int count2)
 {
@@ -370,16 +353,10 @@ cleanup:
 int xdl_do_histogram_diff(mmfile_t *file1, mmfile_t *file2,
 	xpparam_t const *xpp, xdfenv_t *env)
 {
-	int line1, line2, count1, count2;
-
 	if (xdl_prepare_env(file1, file2, xpp, env) < 0)
 		return -1;
 
-	line1 = line2 = 1;
-	count1 = env->xdf1.nrec;
-	count2 = env->xdf2.nrec;
-
-	reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2);
-
-	return histogram_diff(xpp, env, line1, count1, line2, count2);
+	return histogram_diff(xpp, env,
+		env->xdf1.dstart + 1, env->xdf1.dend - env->xdf1.dstart + 1,
+		env->xdf2.dstart + 1, env->xdf2.dend - env->xdf2.dstart + 1);
 }
-- 
1.7.3.4.730.g67af1.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html