And of course s/robin/rabin/ (I can't type RABIN without having my fingers decide on ROBIN by themselves). On Wed, 10 May 2006, Nicolas Pitre wrote: > On Wed, 10 May 2006, Linus Torvalds wrote: > > > > > > > Btw, Nico, that rabin hash code is _extremely_ confusing. > > Possible. > > > The hash entry pointers point to "data + RABIN_WINDOW", and then to make > > things even _more_ confusing, the hash calculation code is actually offset > > by one, so it will have computed the hash with > > > > val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT]; > > > > where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1. > > > > So, if I read that correctly, the "entry->ptr" actually points not to the > > beginning of the data that was hashed, or even the end, but literally to > > the last byte of the data that was hashed in that window. > > > > Isn't that just _really_ confusing? > > > > Or is there some sense to this? > > Yes, ptr points to the last byte of the window for given hash value. > > This is so because in the matching loop the window is scrolled byte by > byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which > already have been processed (most probably added as literal bytes in the > delta buffer) plus one new byte. So it makes sense to start forward > byte matching only from that last byte to find the longest source area > to match, especially since all the other bytes in the window are likely > to be identical already and comparing them repeatedly for entries with > the same hash would be wasteful in most cases. > > Further down, once the best offset with the longest match in the source > buffer has been found, backward matching is performed to remove as much > literal bytes that were added to the delta output as possible, which is > most likely to catch the whole Robin window that hasn't been compared > previously, but in that case the window content is compared only once. > > And the reason why the reference hash is computed with an offset of 1 to > RABIN_WINDOW inclusive in create_delta_index() is to allow for quick > initialization of the Rabin window _outside_ of the main loop in > create_delta(). There is a comment to that effect near the top of > create_delta_index but probably a small reminder should be added in the > index loop as well. > > > Nicolas > - > : send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Nicolas - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html