Re: diffcore-rename performance mode

Jeff King <peff@xxxxxxxx> · Tue, 18 Sep 2007 04:54:13 -0400

On Tue, Sep 18, 2007 at 01:49:50AM -0700, Junio C Hamano wrote:

> > However, keeping around _just_ the
> > cnt_data caused only about 100M of extra memory consumption (and gave
> > the same performance boost).
> 
> That would be an interesting and relatively low-hanging optimization.

OK, I will work up a patch. Is it worth making it configurable? Since it
is a space-time tradeoff, if you are tight on memory, it might actually
hurt performance. However, I have only looked at the numbers for my
massive data set...I can produce memory usage numbers for the kernel,
too.

> I think it was just a hash table with linear overflow (if your
> spot is occupied by somebody else, you look for the next
> available vacant spot -- works only if you do not ever delete
> items from the table) but sorry, I do not recall the rationale
> for picking that data structure.  I vaguely recall I did some
> measurement between that and the usual "an array that is indexed
> with a hash value that holds heads of linked lists" and pointer
> chasing appeared quite cache-unfriendly to the point that it
> actually degraded performance, but did not try very hard to
> optimize it.

I thought we were holding counts of hashes, in which case there _is_ no
overflow. We only care if you hit the hash fingerprint or not. But
perhaps I am mistaken...I will have to look more closely at the code.

-Peff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html