On 23/03/18 02:46, Jeff King wrote: > On Fri, Mar 23, 2018 at 01:28:12AM +0000, Ramsay Jones wrote: > >>> Of the used heap after your patches: >>> >>> - ~40% of that is from packlist_alloc() >>> - ~17% goes to "struct object" >>> - ~10% for the object.c hash table to store all the "struct object" >>> - ~7% goes to the delta cache >>> - ~7% goes to the pack revindex (actually, there's a duplicate 7% >>> there, too; I think our peak is when we're sorting the revindex >>> and have to keep two copies in memory at once) >> >> which begs the question, how much slower would it be if we >> replaced the radix-sort with an in-place sort (e.g. heapsort). >> >> I hacked up the patch below, just for fun. I don't have any >> large repos (or enough disk space) to do any meaningful perf >> tests, but I did at least compile it and it passes the test-suite. >> (That is no guarantee that I haven't introduced bugs, of course!) > > It might have been easier to just revert 8b8dfd5132 (pack-revindex: > radix-sort the revindex, 2013-07-11). It even includes some performance > numbers. :) But not as much fun! :) > In short, no, I don't think we want to go back to a comparison-sort. The > radix sort back then was around 4 times faster for linux.git. And that > was when there were half as many objects in the repository, so the radix > sort should continue to improve as the repo size grows. Yes, I expected radix-sort to be faster. > The absolute time savings aren't huge for something as bulky as a > repack, so it's less exciting in this context. But it's also not that > much memory (7% of the peak here, but as I showed elsewhere, if we can > stop holding all of the "struct object" memory once we're done with it, > then this revindex stuff doesn't even factor into the peak). I didn't see this post until afterwards. So, if it isn't even a factor in the peak memory use, then it's clear this specific space/time trade-off also isn't an issue! ;-) Thanks. ATB, Ramsay Jones