David Kastrup <dak@xxxxxxx> writes: > Jeff King <peff@xxxxxxxx> writes: > >> On Wed, Sep 26, 2007 at 01:05:59PM -0700, Junio C Hamano wrote: >> >>> * jk/diff-rename (Tue Sep 25 15:29:42 2007 -0400) 1 commit >>> + diffcore-rename: cache file deltas >>> >>> Parked in 'next' for now but is 'master' material. >> >> My tests after this patch show that spanhash_find is responsible for >> a large portion of the processing time in large renames, so I am going >> to look into speeding that up. > > In itself, it does not look like there is all too much room for > optimization. One can remove the temporary pointer "optimization" and > see whether this makes strength reduction possible for the compiler. > Making this an endless loop wrapped around a loop on bucket might also > help the compiler in that effect. > > But there is really not all too much leeway, and it might be better > spent in the caller. For example, the search will take something like > r/(1-r) iterations on average where r is the fill ratio of the hash > array. So one would not want to, say, let r grow above 0.75 or > something like that. Ok, here is some suggestion: Here is the inner loop for this stuff: for (i = 0; i < ssz; i++) { struct spanhash *s = &(src_count->data[i]); struct spanhash *d; unsigned dst_cnt, src_cnt; if (!s->cnt) continue; src_cnt = s->cnt; d = spanhash_find(dst_count, s->hashval); dst_cnt = d ? d->cnt : 0; if (src_cnt < dst_cnt) { la += dst_cnt - src_cnt; sc += src_cnt; } else sc += dst_cnt; } Now here is how one could optimize the data structures: The hash structures are with linear probing, and we try to find any hash matches from source to destination. If we sort all hashes indexed to a given first hash bucket by their full hash value, then one could basically use passes similar to list merges for figuring the 1:1 relations. That cuts down the O(l n) cost (where n is the number of elements and l their average run length) to O(n). Of course, making l close to 1 by keeping the hash utilization reasonably low is much simpler. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html