On Wed, 10 May 2006, Linus Torvalds wrote: > > > Btw, Nico, that rabin hash code is _extremely_ confusing. Possible. > The hash entry pointers point to "data + RABIN_WINDOW", and then to make > things even _more_ confusing, the hash calculation code is actually offset > by one, so it will have computed the hash with > > val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT]; > > where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1. > > So, if I read that correctly, the "entry->ptr" actually points not to the > beginning of the data that was hashed, or even the end, but literally to > the last byte of the data that was hashed in that window. > > Isn't that just _really_ confusing? > > Or is there some sense to this? Yes, ptr points to the last byte of the window for given hash value. This is so because in the matching loop the window is scrolled byte by byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which already have been processed (most probably added as literal bytes in the delta buffer) plus one new byte. So it makes sense to start forward byte matching only from that last byte to find the longest source area to match, especially since all the other bytes in the window are likely to be identical already and comparing them repeatedly for entries with the same hash would be wasteful in most cases. Further down, once the best offset with the longest match in the source buffer has been found, backward matching is performed to remove as much literal bytes that were added to the delta output as possible, which is most likely to catch the whole Robin window that hasn't been compared previously, but in that case the window content is compared only once. And the reason why the reference hash is computed with an offset of 1 to RABIN_WINDOW inclusive in create_delta_index() is to allow for quick initialization of the Rabin window _outside_ of the main loop in create_delta(). There is a comment to that effect near the top of create_delta_index but probably a small reminder should be added in the index loop as well. Nicolas - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html