Re: move detection doesnt take filename into account

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> I think the hash here does not collide in that way. It really is just
> the last sixteen characters shoved into a uint32_t.

All bytes overlap with their adjacent byte because they are shifted
by only 2 bits, not 8 bits, when a new byte is brought in.  We can
say that the topmost two bits of the result must have come from the
last character, but other than these, there are more than one input
byte for each bit position to be set/unset by, so two names that human
would not consider "similar" would be given the same hash, no?

That is useful for delta code because the code only needs that
similar things are grouped together, it does not mind things that
are not similar is also mixed to a group, as the end result is
primarily determined by similarity of the actual contents, not
pathnames.

What is under topic in this discussion is the other way around; we
know two paths have contents of the same similarity to the third one
and want to tie-break these two using how similar their pathnames
are to the third one.  
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]