Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend set by xdl_trim_ends() that similarly tells us where the first unmatched line from the start and end occurs. Signed-off-by: Tay Ray Chuan <rctay89@xxxxxxxxx> --- On Wed, Jul 13, 2011 at 3:56 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >[snip] >> + reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2); > > What this does is logically not specific to histogram algorithm but can be > applied to other backends, no? And I vaguely recall that Linus did try > something like this once, found some issues with it when context is set to > non zero, and stopped doing it (sorry, I do not have any more details). > > I am not suggesting you to remove this call or hoist the call to one level > up to xdl_do_diff(), but I do have to wonder how much of the performance > improvement you reported is due to this common head/tail reduction. In a way, this patch is a response to Junio's email. It just made sense to use existing functionality in xdiff (xdl_trim_ends() and xdf->dstart and dend) over writing a new one (reduce_common_start_end()). I believe Junio was referring to this patch by Linus: https://lkml.org/lkml/2007/12/20/692 That is an optimization on a more aggressive level - cutting out content so that it doesn't get hashed in the first place. The optimization used here (reduce_common_start_end()/xdl_trim_ends()) depends on the hashed result and simply reduces the "area" on which the algorithm is applied to. (Actually, I do have a working patch that does content trimming that is context-length safe. But it's not specific to histogram so I'll keep it with me till this series gets merged, lest it holds up the series.) xdiff/xhistogram.c | 31 ++++--------------------------- 1 files changed, 4 insertions(+), 27 deletions(-) diff --git a/xdiff/xhistogram.c b/xdiff/xhistogram.c index 9cb69ea..804e19b 100644 --- a/xdiff/xhistogram.c +++ b/xdiff/xhistogram.c @@ -102,7 +102,7 @@ static int cmp_recs(xpparam_t const *xpp, (cmp_recs(xpp, REC(env, s1, l1), REC(env, s2, l2))) #define CMP(i, s1, l1, s2, l2) \ - (CMP_ENV(i->xpp, i->env, s1, l1, s2, l2)) + (cmp_recs(i->xpp, REC(i->env, s1, l1), REC(i->env, s2, l2))) #define TABLE_HASH(index, side, line) \ XDL_HASHLONG((REC(index->env, side, line))->ha, index->table_bits) @@ -248,23 +248,6 @@ static int find_lcs(struct histindex *index, struct region *lcs, return index->has_common && index->max_chain_length < index->cnt; } -static void reduce_common_start_end(xpparam_t const *xpp, xdfenv_t *env, - int *line1, int *count1, int *line2, int *count2) -{ - if (*count1 <= 1 || *count2 <= 1) - return; - while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, *line1, 2, *line2)) { - (*line1)++; - (*count1)--; - (*line2)++; - (*count2)--; - } - while (*count1 > 1 && *count2 > 1 && CMP_ENV(xpp, env, 1, LINE_END_PTR(1), 2, LINE_END_PTR(2))) { - (*count1)--; - (*count2)--; - } -} - static int fall_back_to_classic_diff(struct histindex *index, int line1, int count1, int line2, int count2) { @@ -370,16 +353,10 @@ cleanup: int xdl_do_histogram_diff(mmfile_t *file1, mmfile_t *file2, xpparam_t const *xpp, xdfenv_t *env) { - int line1, line2, count1, count2; - if (xdl_prepare_env(file1, file2, xpp, env) < 0) return -1; - line1 = line2 = 1; - count1 = env->xdf1.nrec; - count2 = env->xdf2.nrec; - - reduce_common_start_end(xpp, env, &line1, &count1, &line2, &count2); - - return histogram_diff(xpp, env, line1, count1, line2, count2); + return histogram_diff(xpp, env, + env->xdf1.dstart + 1, env->xdf1.dend - env->xdf1.dstart + 1, + env->xdf2.dstart + 1, env->xdf2.dend - env->xdf2.dstart + 1); } -- 1.7.3.4.730.g67af1.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html