Re: What's cooking in git.git (topics)

Jeff King <peff@xxxxxxxx> · Tue, 2 Oct 2007 01:08:21 -0400

On Mon, Oct 01, 2007 at 10:01:16PM -0700, Junio C Hamano wrote:

> > Just to update, I tried using a non-colliding hash for this (at the
> > expense of much memory), and I wasn't able to get things much faster
> > (and certainly not worth the explosion in memory), short of reducing the
> > size of the hash (which is going to reduce the quality of the output).
> > So I am giving up for the time being, but if others are interested in
> > trying to speed things up, I would be happy to discuss ideas.
> 
> Bummer.  You are giving up at the same place I gave up the last
> time.  I was somehow hoping that other people are more clever
> and determined than I was ;-).
> 
> Thanks for trying.

What was so discouraging is that I literally simplified the process to

  for(i = 0; i < HASH_SIZE; i++)
      if(src[i] < dst[i])
        ...

and it spent all of the time on that one conditional.

One approach which I haven't tried but might be promising is to actually
keep each list sorted, and then do a "merge" of the two lists, comparing
as you go. We don't really need to do arbitrary lookups in the hash; we
just need to compare two hash tables at a time. My approach was to be
simple, but have O(HASH_SIZE) comparisons (where HASH_SIZE is on the
order of 2^17), and that's clearly just too big. But with a list merge,
it should be O(n), where n is the actual number of lines in the files
(or binary chunks for the binary case).

-Peff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html