Hi Junio, On Tue, 8 Oct 2019, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > > >> I didn't quite understand this part, though. > >> > >> The default creation factor is 60 (roughly speaking, it wants 60% of > >> the lines to match between two patches, otherwise it considers the > >> patches to be unrelated). > >> > >> Would the updated creation factor used which is 95 (roughly > >> speaking) want 95% of the lines to match between two patches? > >> > >> That would make the matching logic even pickier and reject more > >> paring, so I must be reading the statement wrong X-<. > > > > No, I must have written the opposite of what I tried to say, is all. > > So, cfactor of 60 means at most 60% is allowed to differ and the > two patches are still considered to be related, while 95 means only > 5% needs to be common? That would make more sense to me. Okay, I not only wrote the opposite of what I wanted to say, I also misremembered. When `range-diff` tries to determine matching pairs of patches, it builds an `(m+n)x(m+n)` cost matrix, where `m` is the number of patches in the first commit range and `n` is the number of patches in the second one. Why not `m x n`? Well, that's the obvious matrix, and that's what it starts with, essentially assigning the number of lines of the diff between the diffs as "cost". But then `git range-diff` extends the cost matrix to allow for _all_ of the `m` patches to be considered deleted, and _all_ of the `n` patches to be added. As cost, it cannot use a "diff of diffs" because there is no second diff. So it uses the number of lines of the one diff it has, multiplied by the creation factor interpreted as a percentage. The naive creation factor would be 100%, which is (almost) as if we assumed an empty diff for the missing diff. But that would make the range-diff too eager to dismiss rewrites, as experience obviously showed (not my experience, but Thomas Rast's, who came up with `tbdiff` after all): the diff of diffs includes a diff header, for example. The interpretation I offered (although I inverted what I wanted to say) is similar in spirit to that metric (which is not actually a metric, I believe, because I expect it to violate the triangle inequality) is obviously inaccurate: the number of lines of the diff of diffs does not say anything about the number of matching lines, quite to the contrary, it correlates somewhat to the number of non-matching lines. So a better interpretation would have been: The default creation factor is 60 (roughly speaking, it wants at most 60% of the diffs' lines to differ, otherwise it considers them not to be a match. This is still inaccurate, but at least it gets the idea of the range-diff across. Of course, I will never be able to amend the commit message in GitGitGadget anyway, as I have merged that PR already. Ciao, Dscho