On Mon, Oct 01, 2007 at 10:01:16PM -0700, Junio C Hamano wrote: > > Just to update, I tried using a non-colliding hash for this (at the > > expense of much memory), and I wasn't able to get things much faster > > (and certainly not worth the explosion in memory), short of reducing the > > size of the hash (which is going to reduce the quality of the output). > > So I am giving up for the time being, but if others are interested in > > trying to speed things up, I would be happy to discuss ideas. > > Bummer. You are giving up at the same place I gave up the last > time. I was somehow hoping that other people are more clever > and determined than I was ;-). > > Thanks for trying. What was so discouraging is that I literally simplified the process to for(i = 0; i < HASH_SIZE; i++) if(src[i] < dst[i]) ... and it spent all of the time on that one conditional. One approach which I haven't tried but might be promising is to actually keep each list sorted, and then do a "merge" of the two lists, comparing as you go. We don't really need to do arbitrary lookups in the hash; we just need to compare two hash tables at a time. My approach was to be simple, but have O(HASH_SIZE) comparisons (where HASH_SIZE is on the order of 2^17), and that's clearly just too big. But with a list merge, it should be O(n), where n is the actual number of lines in the files (or binary chunks for the binary case). -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html