Re: [PATCH] fix diff-delta bad memory access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



And of course s/robin/rabin/ (I can't type RABIN without having my 
fingers decide on ROBIN by themselves).

On Wed, 10 May 2006, Nicolas Pitre wrote:

> On Wed, 10 May 2006, Linus Torvalds wrote:
> 
> > 
> > 
> > Btw, Nico, that rabin hash code is _extremely_ confusing.
> 
> Possible.
> 
> > The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
> > things even _more_ confusing, the hash calculation code is actually offset 
> > by one, so it will have computed the hash with
> > 
> > 	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
> > 
> > where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.
> > 
> > So, if I read that correctly, the "entry->ptr" actually points not to the 
> > beginning of the data that was hashed, or even the end, but literally to 
> > the last byte of the data that was hashed in that window.
> > 
> > Isn't that just _really_ confusing?
> > 
> > Or is there some sense to this?
> 
> Yes, ptr points to the last byte of the window for given hash value.
> 
> This is so because in the matching loop the window is scrolled byte by 
> byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which 
> already have been processed (most probably added as literal bytes in the 
> delta buffer) plus one new byte.  So it makes sense to start forward 
> byte matching only from that last byte to find the longest source area 
> to match, especially since all the other bytes in the window are likely 
> to be identical already and comparing them repeatedly for entries with 
> the same hash would be wasteful in most cases.
> 
> Further down, once the best offset with the longest match in the source 
> buffer has been found, backward matching is performed to remove as much 
> literal bytes that were added to the delta output as possible, which is 
> most likely to catch the whole Robin window that hasn't been compared 
> previously, but in that case the window content is compared only once.
> 
> And the reason why the reference hash is computed with an offset of 1 to 
> RABIN_WINDOW inclusive in create_delta_index() is to allow for quick 
> initialization of the Rabin window _outside_ of the main loop in 
> create_delta().  There is a comment to that effect near the top of 
> create_delta_index but probably a small reminder should be added in the 
> index loop as well.
> 
> 
> Nicolas
> -
> : send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]