On Thu, Dec 9, 2021 at 1:57 PM David Howells <dhowells@xxxxxxxxxx> wrote: > > What I'm trying to get at is that the hash needs to be consistent, no matter > the endianness of the cpu, for any particular input blob. Yeah, if that's the case, then you should probably make that "unsigned int *data" argument probably just be "void *" and then: > a = *data++; <<<<<<< > HASH_MIX(x, y, a); > } > return fold_hash(x, y); > } > > The marked line should probably use something like le/be32_to_cpu(). Yes, it should be using a '__le32 *' inside that function and you should use l32_to_cpu(). Obviously, BE would work too, but cause unnecessary work on common hardware. But as mentioned for the other patches, you should then also be a lot more careful about always using the end result as an 'unsigned int' (or maybe 'u32') too, and when comparing hashes for binary search or other, you should always do th4e compare in some stable format. Because doing return (long)hash_a - (long)hash_b; and looking at the sign doesn't actually result in a stable ordering on 32-bit architectures. You don't get a transitive ordering (ie a < b and b < c doesn't imply a < c). And presumably if the hashes are meaningful across machines, then hash comparisons should also be meaningful across machines. So when comparing hashes, you need to compare them either in a truly bigger signed type (and make sure that doesn't get truncated) - kind of like how a lot of 'memcmp()' functions do 'unsigned char' subtractions in an 'int' - or you need to compare them _as_ 'unsigned int'. Otherwise the comparisons will be all kinds of messed up. Linus