Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > + while ((c = (uint8_t) *name++) != 0) { > + if (isspace(c)) > + continue; > + if (c == '/') { > + base = (base >> 6) ^ hash; > + hash = 0; > + } else { > + uint8_t nybble_swapped = (c >> 4) + ((c & 15) << 4); > + hash = (hash >> 2) + (nybble_swapped << 24); > + } > } > + return (base >> 6) ^ hash; > } Nice. The diff relative to the --full-name-hash version is a bit hard to grok, but compared to the current hash function, there are two and a half changes that matter: (0) it is more careful with bytes with the MSB set (i.e. non-ASCII pathnames). (1) it hashes each path component separetely and rotates the whole thing only at a directory boundary. I'd imagine that this would make a big difference for languages that force overly long filenames at each level. (2) it gives more weight to lower bits by swapping nybbles of each byte. I wonder if we do even better if we reverse all 8 bits instead of swapping nybbles (if we were to do so, it might be more efficient to shift in from the right instead of left end of the base and hash accumulators in the loop and then swap the whole resulting word at the end). Thanks for a fun read.