Re: heads-up: git-index-pack in "next" is broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 18 Oct 2006, Davide Libenzi wrote:
>
> Speaking in general, seen at the hash function level, of course an interface 
> should not give different result for different word sizes or word endianess. 
> Considering the diff algorithm as interface, as I said, the output was 
> unaffected by the 64 bits word size. It was just very slow.

Well, even the output may actually be affected, in the case of _real_ hash 
collisions (as opposed to just the hash _list_ collision that XDL_HASHLONG 
caused).

So I actually think it would be better to have "uint32_t" as the hash 
value - because that would mean that all diffs (or, in the case of the 
block-algorithm, the deltas) are guaranteed to give the same results 
regardless of architecture.

Right now, we actually generate a 64-bit hash value (BUT: for short lines, 
it's likely only _interesting_ in the low bits, so the high bits tend to 
have a very high likelihood of being zero). So hash collisions are 
different: on a 32-bit architecture, two lines may have the same hash, 
while on a 64-bit one, they are different.

And together with some of the limiters we have (eg XDL_MAX_EQLIMIT) hash 
collisions can sometimes affect the output.

Admittedly, in _practice_ this is really unlikely to affect anything 
(you'd get a valid diff in either case, they'd just possibly be subtly 
different, and the input data must be _really_ strange to even see that 
case), but I do think that the hash algorithm can matter.

NOTE! I'm not talking about XDL_HASHLONG(), I'm talking about the 
xdl_hash_record() hash, which returns differently-sized hash results on 
32-bit and 64-bit. And there are cases where we _only_ compare the hashes, 
and don't actually double-check the contents.

So I think that in _practice_ you can't see differences between a 32-bit 
version and a 64-bit one, but the possibility is there. Using "uint32_t" 
instead of "unsigned long" to keep track of hashes would avoid that 
theoretical problem (and might actually make for better performance on 
64-bit archtiectures, if only because of denser data structures and thus 
better cache behaviour).

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]