Re: [PATCH v6 6/6] blame: use a fingerprint heuristic to match ignored lines

Michael Platings <michael@xxxxxxxxx> · Sun, 14 Apr 2019 10:41:26 +0100

>  - I wonder if the hash used here can replace what is used in
>    diffcore-delta.c as an improvement (or obviously vice versa), as
>    using two (or more) ad-hoc fingerprinting function without having
>    a clear reason why we need two instead of a unified one feels
>    like a bad idea.

Hi Junio,
If I understand correctly, the algorithm in diffcore-delta.c is
intended to match files that contain identical lines (or 64-byte
chunks). The fingerprinting that Barret & I are talking about is
intended to match lines that contain identical byte pairs.
With significant refactoring, you could make the diffcore-delta
algorithm apply in both cases but I think the end result would be
longer and more complicated than keeping the two separate.
Unlike hashing a line, hashing a byte pair is trivial. Unlike hashing
lines, all except the first and last bytes are included in two
"hashes" - "hello" is hashed to "he", "el", "ll", "lo".
So based on my limited understanding of diffcore-delta.c I think the
two are algorithms are sufficiently different in intent and in
implementation that it's appropriate to keep them separate.

Regarding the "old heuristic" I think there may still be a use case
for that but I'll expand on that later.

Thanks,
-Michael