On Thu, 25 Oct 2007, Daniel Barkalow wrote: > > Creating a list of the pointers doesn't work correctly with the grow > implementation, because growing the hash may turn a collision into a > non-collision, at which point items other than the first cannot be found > (since they're listed inside a bucket that's now wrong for them). AFAIK, > resizing a hash table requires being able to figure out what happened with > collisions. Nope. The hash algorithm is much smarter than that. I *always* uses a full 32-bit hash, and no amount of resizing is ever going to change that. The index into the hash-table is in fact entirely unused. This has several good properties: - it means that hash-table resizing is a non-event - it means that you always have the full 32-bit hash, and a collision in the hash size never causes unnecessary work apart from the fact that the code walks the hash table a bit more. - because the hash is embedded in the table itself, it has relatively good cache behaviour when compared to something that needs to actually follow the pointer to validate the full data. So assuming that the full 32-bit hash is good enough to effectively never have any collisions (or, assuming you don't even *care* about the collisions, which is the case when you're just generating content fingerprints for lines when comparing the data in two files), you never end up with unnecessarily following pointers to cachelines that you are not interested in. The last point at least somewhat mitigates the (inevitably) bad cache behaviour that hash tables tend to have. It's not like it's going to be wonderful in the cache, but at least it's less horrid than the more common implementation that needs to follow the pointer to validate each hash entry that may or may not be a collision. but the important part is #1, which is what allows the code to be a generic hash algorithm that resizes the hash table without even understanding or caring what is behind the pointer. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html