On Tue, Sep 18, 2007 at 01:49:50AM -0700, Junio C Hamano wrote: > > However, keeping around _just_ the > > cnt_data caused only about 100M of extra memory consumption (and gave > > the same performance boost). > > That would be an interesting and relatively low-hanging optimization. OK, I will work up a patch. Is it worth making it configurable? Since it is a space-time tradeoff, if you are tight on memory, it might actually hurt performance. However, I have only looked at the numbers for my massive data set...I can produce memory usage numbers for the kernel, too. > I think it was just a hash table with linear overflow (if your > spot is occupied by somebody else, you look for the next > available vacant spot -- works only if you do not ever delete > items from the table) but sorry, I do not recall the rationale > for picking that data structure. I vaguely recall I did some > measurement between that and the usual "an array that is indexed > with a hash value that holds heads of linked lists" and pointer > chasing appeared quite cache-unfriendly to the point that it > actually degraded performance, but did not try very hard to > optimize it. I thought we were holding counts of hashes, in which case there _is_ no overflow. We only care if you hit the hash fingerprint or not. But perhaps I am mistaken...I will have to look more closely at the code. -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html