Re: [PATCH 5/6] Do linear-time/space rename logic for exact renames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 25 Oct 2007, Daniel Barkalow wrote:
> 
> Creating a list of the pointers doesn't work correctly with the grow 
> implementation, because growing the hash may turn a collision into a 
> non-collision, at which point items other than the first cannot be found 
> (since they're listed inside a bucket that's now wrong for them). AFAIK, 
> resizing a hash table requires being able to figure out what happened with 
> collisions.

Nope. 

The hash algorithm is much smarter than that.

I *always* uses a full 32-bit hash, and no amount of resizing is ever 
going to change that. The index into the hash-table is in fact entirely 
unused.

This has several good properties:

 - it means that hash-table resizing is a non-event

 - it means that you always have the full 32-bit hash, and a collision in 
   the hash size never causes unnecessary work apart from the fact that 
   the code walks the hash table a bit more.

 - because the hash is embedded in the table itself, it has relatively 
   good cache behaviour when compared to something that needs to actually 
   follow the pointer to validate the full data. So assuming that the full 
   32-bit hash is good enough to effectively never have any collisions 
   (or, assuming you don't even *care* about the collisions, which is the 
   case when you're just generating content fingerprints for lines when 
   comparing the data in two files), you never end up with unnecessarily 
   following pointers to cachelines that you are not interested in.

The last point at least somewhat mitigates the (inevitably) bad cache 
behaviour that hash tables tend to have. It's not like it's going to be 
wonderful in the cache, but at least it's less horrid than the more common 
implementation that needs to follow the pointer to validate each hash 
entry that may or may not be a collision.

but the important part is #1, which is what allows the code to be a 
generic hash algorithm that resizes the hash table without even 
understanding or caring what is behind the pointer.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux