Jeff King <peff@xxxxxxxx> writes: > We find rename candidates by computing a fingerprint hash of > each file, and then comparing those fingerprints. There are > inherently O(n^2) comparisons, so it pays in CPU time to > hoist the (rather expensive) computation of the fingerprint > out of that loop (or to cache it once we have computed it once). > > Previously, we didn't keep the filespec information around > because then we had the potential to consume a great deal of > memory. However, instead of keeping all of the filespec > data, we can instead just keep the fingerprint. > > This patch implements and uses diff_free_filespec_data_large > to accomplish that goal. We also have to change > estimate_similarity not to needlessly repopulate the > filespec data when we already have the hash. > > Practical tests showed 4.5x speedup for a 10% memory usage > increase. > > Signed-off-by: Jeff King <peff@xxxxxxxx> Very nice. > The implementation is a little less nice than I would like, but I was > trying to be non-invasive. Specifically: > > - the name diff_free_filespec_data_large is horrible, but this is based > on the fact that diff_free_filespec_data actually does too much (it > frees the data _and_ some other auxiliary data). And renaming that > would entail changing many callsites. True. But we can rename it to diff_file_filespec_blob() and that would perfectly well describe what it does. Will do so when applying if it is Ok to you. > - It seems that a better place to call diffcore_populate_filespec > (rather than in estimate_similarity) would actually be in > diffcore_count_changes (when we _know_ that we need to populate the > contents data). > > - The hash_chars() should arguably be tied into > diffcore_populate_filespec, which should have more of a "what > information do you want" interface. I.e., the "size_only" parameter > could be extended to a bitfield to say "populate this, and I need the > delta fingerprint, size, actual contents, etc". Then callers could > just use "populate" before looking at the filespec and it would > lazily load whatever they needed. Both good points, but I agree with you that it is wise to do that with a follow-up patch. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html