Jeff King <peff@xxxxxxxx> writes: > On Tue, Jul 01, 2014 at 10:08:15AM -0700, Junio C Hamano wrote: > >> I didn't think it through but my gut feeling is that we could change >> the name similarity score to be the length of the tail part that >> matches (e.g. 1.a to a/2.a that has the same two bytes at the tail >> is a better match than to a/2.b that does not share any tail, and to >> a/1.a that shares the three bytes at the tail is an even better >> match). > > The delta heuristics in pack-objects use pack_name_hash, which claims: > > /* > * This effectively just creates a sortable number from the > * last sixteen non-whitespace characters. Last characters > * count "most", so things that end in ".c" sort together. > */ > > which might be another option (and seems like a superset of the basename > check, short of basenames that are longer than 16 characters). Perhaps. I am however not sure if the code to compute similarity score is as OK with false positives, i.e. dissimilar names that happen to hash together getting clumped in a same bin or in close bins, as the existing callers of pack_name_hash(). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html