Re: Bug: git log --numstat counts wrong

Tay Ray Chuan <rctay89@xxxxxxxxx> · Fri, 23 Sep 2011 14:30:40 +0800

On Fri, Sep 23, 2011 at 12:15 AM, René Scharfe
<rene.scharfe@xxxxxxxxxxxxxx> wrote:
> The patch below reverts a part of 27af01d5523 that's not explained in its
> commit message and doesn't seem to contribute to the intended speedup.  It
> seems to restore the original diff output.  I don't know how it's actually
> doing that, though, as I haven't dug into the code at all.
>
> [snip]
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 5a33d1a..e419f4f 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -383,7 +383,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>  * might be potentially discarded if they happear in a run of discardable.
>  */
>  static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
> -       long i, nm, nreff;
> +       long i, nm, nreff, mlim;
>        xrecord_t **recs;
>        xdlclass_t *rcrec;
>        char *dis, *dis1, *dis2;
> @@ -396,16 +396,20 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>        dis1 = dis;
>        dis2 = dis1 + xdf1->nrec + 1;
>
> +       if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
> +               mlim = XDL_MAX_EQLIMIT;
>        for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
>                rcrec = cf->rcrecs[(*recs)->ha];
>                nm = rcrec ? rcrec->len2 : 0;
> -               dis1[i] = (nm == 0) ? 0: 1;
> +               dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
>        }
>
> +       if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> +               mlim = XDL_MAX_EQLIMIT;
>        for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
>                rcrec = cf->rcrecs[(*recs)->ha];
>                nm = rcrec ? rcrec->len1 : 0;
> -               dis2[i] = (nm == 0) ? 0: 1;
> +               dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
>        }
>
>        for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
>
>
>

Thanks for the patch, René.

Sorry for not explaining that part of the change.

My understanding of mlim is that it "caps" how deep the for loop at
around line 387 goes through a hash bucket/record chaing to find a
matching record from side A in side B (and vice-versa in a later
loop), probably to prevent running time from becoming too long.

But with 27af01d, this is no longer a concern. We can get an *exact*,
pre-computed count of matching records in the other side, so we don't
have go through the hash bucket. Thus mlim is no longer needed.

So re-introducing mlim doesn't seem right, even though it may fix this
"bug" (ie restore the old behaviour).

-- 
Cheers,
Ray Chuan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html