On Wed, 18 Oct 2006, Davide Libenzi wrote: > > Speaking in general, seen at the hash function level, of course an interface > should not give different result for different word sizes or word endianess. > Considering the diff algorithm as interface, as I said, the output was > unaffected by the 64 bits word size. It was just very slow. Well, even the output may actually be affected, in the case of _real_ hash collisions (as opposed to just the hash _list_ collision that XDL_HASHLONG caused). So I actually think it would be better to have "uint32_t" as the hash value - because that would mean that all diffs (or, in the case of the block-algorithm, the deltas) are guaranteed to give the same results regardless of architecture. Right now, we actually generate a 64-bit hash value (BUT: for short lines, it's likely only _interesting_ in the low bits, so the high bits tend to have a very high likelihood of being zero). So hash collisions are different: on a 32-bit architecture, two lines may have the same hash, while on a 64-bit one, they are different. And together with some of the limiters we have (eg XDL_MAX_EQLIMIT) hash collisions can sometimes affect the output. Admittedly, in _practice_ this is really unlikely to affect anything (you'd get a valid diff in either case, they'd just possibly be subtly different, and the input data must be _really_ strange to even see that case), but I do think that the hash algorithm can matter. NOTE! I'm not talking about XDL_HASHLONG(), I'm talking about the xdl_hash_record() hash, which returns differently-sized hash results on 32-bit and 64-bit. And there are cases where we _only_ compare the hashes, and don't actually double-check the contents. So I think that in _practice_ you can't see differences between a 32-bit version and a 64-bit one, but the possibility is there. Using "uint32_t" instead of "unsigned long" to keep track of hashes would avoid that theoretical problem (and might actually make for better performance on 64-bit archtiectures, if only because of denser data structures and thus better cache behaviour). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html