Linus Torvalds <torvalds@xxxxxxxx> writes: > Umm. Why do you rehash at all? > > Just take the size of the "src" file as the initial hash size. The code uses close to 16-bit hash and I had 65k flat array as a hashtable. That one was what you commented as "4-times as many page misses". Interestingly enough, that kind of flat array representation seems to be too sparse and gives very bad performance behaviour. The improvement I mentioned in the message you are replying to is the result of making it into smaller (starting at (1<<9) or something like that) linear-overflowing hash. The latter suggestion I need to think about it a bit more. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html