Re: [RFC/PATCH 0/2] Enhance performance of blame -C -C

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 15 Jul 2008 15:25:29 -0700

Alexander Gavrilov <angavrilov@xxxxxxxxx> writes:

> This pair of patches aims at increasing performance of copy detection in
> blame by avoiding unnecessary comparisons. Note that since I'm new to
> this code, I might have misunderstood something.
>
> There are two cases than I aim to fix:
>
> 1) Copy detection is done by comparing all outstanding chunks of the
> target file to all blobs in the parent. After that, chunks with suitable
> matches are split, and comparison is repeated again, until there are no
> new matches. The trouble is, chunks that didn't match the first time,
> and weren't split, are compared against the same set of blobs again and
> again. I add a flag to track that.
>
>   On my test case it decreased blame -C -C time from over 10min to
>   ~6min; 4min with -C80.
>
> 2) Chunks are split only if the match scores above a certain
> threshold. I understand that a split of an entry cannot score more than
> the entry itself. Thus, it is pointless to even try doing costly
> comparisons for small entries.
>
>   (Time goes down to 4min; 2min with -C80)

Ideas for both patches sound very sane.  Will take a deeper look later.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html