On Tue, Oct 02, 2007 at 01:08:20AM -0400, Jeff King wrote: > One approach which I haven't tried but might be promising is to actually > keep each list sorted, and then do a "merge" of the two lists, comparing > as you go. We don't really need to do arbitrary lookups in the hash; we > just need to compare two hash tables at a time. My approach was to be > simple, but have O(HASH_SIZE) comparisons (where HASH_SIZE is on the > order of 2^17), and that's clearly just too big. But with a list merge, > it should be O(n), where n is the actual number of lines in the files > (or binary chunks for the binary case). BTW, I don't want to steal credit for this idea...it comes from thinking about what David Kastrup said earlier in the thread, though I think he was proposing sorting just inside buckets. -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html