Jeff King <peff@xxxxxxxx> writes: > There seems to be a serious performance problem in diffcore-rename. > There is infrastructure to cache the "cnt_data" member of each filespec, > but it never gets used because we immediately free the filespec data > after use. Oops. > > With this patch: > ... > My 20-minute diff becomes a 2-minute diff. The downside is that the > memory usage is much increased (for obvious reasons, it should increase > by the dataset size, since we are keeping pointers to the data around -- > in my case, around 1G extra). Yes, these early freeing of filespec_data() were introducd later specifically to address the memory usage issue. > However, keeping around _just_ the > cnt_data caused only about 100M of extra memory consumption (and gave > the same performance boost). That would be an interesting and relatively low-hanging optimization. > The spanhash data structure is a bit confusing. At first, it looked like > we were doing a linear search for a matching hash, but it's not quite, > since we seem to start at some magic spot based on the hashval we're > looking up. I think it was just a hash table with linear overflow (if your spot is occupied by somebody else, you look for the next available vacant spot -- works only if you do not ever delete items from the table) but sorry, I do not recall the rationale for picking that data structure. I vaguely recall I did some measurement between that and the usual "an array that is indexed with a hash value that holds heads of linked lists" and pointer chasing appeared quite cache-unfriendly to the point that it actually degraded performance, but did not try very hard to optimize it. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html