Jeff King <peff@xxxxxxxx> writes: > I think the hash here does not collide in that way. It really is just > the last sixteen characters shoved into a uint32_t. All bytes overlap with their adjacent byte because they are shifted by only 2 bits, not 8 bits, when a new byte is brought in. We can say that the topmost two bits of the result must have come from the last character, but other than these, there are more than one input byte for each bit position to be set/unset by, so two names that human would not consider "similar" would be given the same hash, no? That is useful for delta code because the code only needs that similar things are grouped together, it does not mind things that are not similar is also mixed to a group, as the end result is primarily determined by similarity of the actual contents, not pathnames. What is under topic in this discussion is the other way around; we know two paths have contents of the same similarity to the third one and want to tie-break these two using how similar their pathnames are to the third one. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html