> - I wonder if the hash used here can replace what is used in > diffcore-delta.c as an improvement (or obviously vice versa), as > using two (or more) ad-hoc fingerprinting function without having > a clear reason why we need two instead of a unified one feels > like a bad idea. Hi Junio, If I understand correctly, the algorithm in diffcore-delta.c is intended to match files that contain identical lines (or 64-byte chunks). The fingerprinting that Barret & I are talking about is intended to match lines that contain identical byte pairs. With significant refactoring, you could make the diffcore-delta algorithm apply in both cases but I think the end result would be longer and more complicated than keeping the two separate. Unlike hashing a line, hashing a byte pair is trivial. Unlike hashing lines, all except the first and last bytes are included in two "hashes" - "hello" is hashed to "he", "el", "ll", "lo". So based on my limited understanding of diffcore-delta.c I think the two are algorithms are sufficiently different in intent and in implementation that it's appropriate to keep them separate. Regarding the "old heuristic" I think there may still be a use case for that but I'll expand on that later. Thanks, -Michael