Re: move detection doesnt take filename into account

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 01 Jul 2014 10:08:15 -0700

Elliot Wolk <elliot.wolk@xxxxxxxxx> writes:

> On 07/01/2014 10:57 AM, Junio C Hamano wrote:
>> Robin Rosenberg <robin.rosenberg@xxxxxxxxxx> writes:
>>
>>> I think it does, but based on filename suffix. E.g. here is a rename of
>>> three empty files with a suffix.
>>>
>>>   3 files changed, 0 insertions(+), 0 deletions(-)
>>>   rename 1.a => 2.a (100%)
>>>   rename 1.b => 2.b (100%)
>>>   rename 1.c => 2.c (100%)
>> This is not more than a chance.
>>
>> We tie-break rename source candidates that have the same content
>> similarity score to a rename destination using "name similarity",
>> whose implementation has been diffcore-rename.c::basename_same(),
>> which scores 1 if `basename $src` and `basename $dst` are the same
>> and 0 otherwise, i.e. from 1.a to a/1.a is judged to be a better
>> rename than from 1.a to a/2.a but otherwise there is nothing that
>> favors rename from 1.a to 2.a over 1.a to 2.b.
>
> thanks for the info!
> then i suppose my bug is a petition to have name similarity instead
> use a different statistical matching algorithm.

[administrivia: please do not top-post on this list]

I didn't think it through but my gut feeling is that we could change
the name similarity score to be the length of the tail part that
matches (e.g. 1.a to a/2.a that has the same two bytes at the tail
is a better match than to a/2.b that does not share any tail, and to
a/1.a that shares the three bytes at the tail is an even better
match).

Oh, and rename basename_same() to something else; currently it is
only used as the "name similarity", and after such a change, it will
stay to be "name similarity" but will not be asking "are basenames
the same?" anymore.

Hint, hint...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html