"Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > This series depends on en/merge-ort-perf and makes full use of exact > renames; see commit messages for details. > > Thanks to Stolee and Junio for reviewing v1. > > Changes since v1: > > * Update rename_src_nr when updating rename_src > * Introduce want_copies in the first patch and use it in a few more places > * Move a comment below a few exit-early if-checks. > > Elijah Newren (2): > diffcore-rename: no point trying to find a match better than exact > diffcore-rename: filter rename_src list when possible > > diffcore-rename.c | 69 +++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 61 insertions(+), 8 deletions(-) Thanks, these look bettrer. With these changes, I guess there are only two things I find myself somewhat embarrassing in the rename machinery that is still there since I invented it. - We still need to go full matrix while finding the "best" pairing. I cannot think of a way to avoid it (that is what makes it embarrassing) but wish there were some way to. In an early attempt, I tried to retire rename_src[j], once rename_dst[i] has been found to be a "good enough" match for it, from the pool of rename src candidates to find a good match for rename_dst[k] for i < k, but naive implementation of it would not work well for obvious reasons---rename_src[j] may match a lot better with rename_dst[k] than rename_dst[i] but we do not know that until we try to estimate similarity with rename_dst[k]. - The .cnt_data member was designed to be a concise summary of the blob characteristics so that two .cnt_data can be "compared" fairly cheaply to see how "similar" two blobs are [*], but (1) it is rather big to be called a "concise summary", and (2) it was not chosen after real performance measurement, and we've been using it for the past 15 years without revisiting its design. Side note: In a very early prototype, the approach to assess similarity between two blobs was very different---there was no attempt to compute "concise summary" for each blob, but we just attempted to create delta (as in the pack data) between src and dst blobs and measured how small a delta we can use to transform from src to dst.