I answered this post already, but the answer seems to have disappeared into a black hole. What a nuisance. Nicolas Pitre <nico@xxxxxxx> writes: > On Sat, 8 Sep 2007, David Kastrup wrote: > >> In normal use cases, the performance wins are not overly impressive: >> we get something like 5-10% due to the slightly better locality of >> memory accesses using the packed structure. > > The gain is probably counterbalanced by the fact that you're copying > the whole index when packing it, which is unfortunate. The alternative would be packing in-place. That's actually a rather contorted operation, and the gains from packing would then have to be claimed by "realloc". I think that realloc is more likely to leave a fragmented address space (giving the reclaimed memory to the heap rather than back to the system), but I'd have to check glibc for details at least on GNU/Linux. Another option avoiding the realloc would be not to use linked lists at all but just collect all entries sequentually in "packed" form, and sort them into order afterwards. That's an O(n lg n) operation with a large n. Even if one has a sorting algorithm with good memory locality, I doubt that the locality would compensate for the lg n factor, even when taking into account that we save ourselves the copying. And that is not even taken the possibility of having to cull some buckets into account, another O(n) operation (which amounts to copying everything once, too). > Also, could you provide actual test results backing your performance > claim? 5-10% is still not negligible. dak@lola:/usr/local/tmp/git$ for i in .git/objects/[0-9a-f][0-9a-f]/[0-9a-f]*;do echo $i;done|sed 's+.*ts/\(..\)/+\1+' > /tmp/objlist dak@lola:/usr/local/tmp/git$ time ./git-pack-objects </tmp/objlist --stdout|dd of=/dev/null 4099+2 records in 4100+1 records out 2099205 bytes (2.1 MB) copied, 4.01225 seconds, 523 kB/s real 0m4.043s user 0m3.836s sys 0m0.164s dak@lola:/usr/local/tmp/git$ time git-pack-objects </tmp/objlist --stdout|dd of=/dev/null 4099+2 records in 4100+1 records out 2099205 bytes (2.1 MB) copied, 4.34936 seconds, 483 kB/s real 0m4.381s user 0m4.056s sys 0m0.176s dak@lola:/usr/local/tmp/git$ Of course, on a warm file system. >> - struct index_entry *entry, **hash; >> + struct unpacked_index_entry *entry, **hash; >> + struct index_entry *aentry, **ahash; > > What does the "a" stand for? array (rather than linked list). Should I rather rename entry and hash to uentry and uhash (for unpacked)? >> + mem = index+1; > [...] >> + for (i=0; i<hsize; i++) { > [...] >> + for (entry=hash[i]; entry; entry=entry->next) > > Minor style nit: please add spaces around "+", "=", "<", etc. for > consistency. Will do. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html