Re: move detection doesnt take filename into account

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 09 Jul 2014 08:51:07 -0700

Jeff King <peff@xxxxxxxx> writes:

> On Tue, Jul 01, 2014 at 10:08:15AM -0700, Junio C Hamano wrote:
>
>> I didn't think it through but my gut feeling is that we could change
>> the name similarity score to be the length of the tail part that
>> matches (e.g. 1.a to a/2.a that has the same two bytes at the tail
>> is a better match than to a/2.b that does not share any tail, and to
>> a/1.a that shares the three bytes at the tail is an even better
>> match).
>
> The delta heuristics in pack-objects use pack_name_hash, which claims:
>
>         /*
>          * This effectively just creates a sortable number from the
>          * last sixteen non-whitespace characters. Last characters
>          * count "most", so things that end in ".c" sort together.
>          */
>
> which might be another option (and seems like a superset of the basename
> check, short of basenames that are longer than 16 characters).

Perhaps.

I am however not sure if the code to compute similarity score is as
OK with false positives, i.e. dissimilar names that happen to hash
together getting clumped in a same bin or in close bins, as the
existing callers of pack_name_hash().

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html