Re: [PATCH] fix diff-delta bad memory access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 10 May 2006, Linus Torvalds wrote:

> 
> 
> Btw, Nico, that rabin hash code is _extremely_ confusing.

Possible.

> The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
> things even _more_ confusing, the hash calculation code is actually offset 
> by one, so it will have computed the hash with
> 
> 	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
> 
> where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.
> 
> So, if I read that correctly, the "entry->ptr" actually points not to the 
> beginning of the data that was hashed, or even the end, but literally to 
> the last byte of the data that was hashed in that window.
> 
> Isn't that just _really_ confusing?
> 
> Or is there some sense to this?

Yes, ptr points to the last byte of the window for given hash value.

This is so because in the matching loop the window is scrolled byte by 
byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which 
already have been processed (most probably added as literal bytes in the 
delta buffer) plus one new byte.  So it makes sense to start forward 
byte matching only from that last byte to find the longest source area 
to match, especially since all the other bytes in the window are likely 
to be identical already and comparing them repeatedly for entries with 
the same hash would be wasteful in most cases.

Further down, once the best offset with the longest match in the source 
buffer has been found, backward matching is performed to remove as much 
literal bytes that were added to the delta output as possible, which is 
most likely to catch the whole Robin window that hasn't been compared 
previously, but in that case the window content is compared only once.

And the reason why the reference hash is computed with an offset of 1 to 
RABIN_WINDOW inclusive in create_delta_index() is to allow for quick 
initialization of the Rabin window _outside_ of the main loop in 
create_delta().  There is a comment to that effect near the top of 
create_delta_index but probably a small reminder should be added in the 
index loop as well.


Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]