On Mon, 10 Dec 2007, Jon Smirl wrote: > On 12/10/07, Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > > I just deleted the section looking for identical hashes. > > > > + while (sub_size && list[0]->hash && > > + list[0]->hash == list[-1]->hash) { > > + list++; > > + sub_size--; > > + } > > > > Doing that allows the long chains to be split over the cores. > > > > My last 5% of objects is taking over 50% of the total CPU time in the > > repack. I think these objects are the ones from that 103,817 entry > > chain. It is also causing the explosion in RAM consumption. > > > > At the end I can only do 20 objects per clock second on four cores. It > > takes 30 clock minutes (120 CPU minutes) to do the last 3% of objects. > > It's all in create_delta... Here you're mixing two different hashes with no relation what so ever with each other. The hash in create_delta corresponds to chunk of data in a reference buffer that we try to match in a source buffer. The hash in the code above has to do with the file names the corresponding objects are coming from. And again, both hash uses are deterministic i.e. they will be the same when repacking with -f regardless if the source pack is the 2.1GB or the 300MB one, so they may not explain the huge performance and memory usage discrepency you see between those two packs. The code that do get influenced by the source pack, though, is all concentrated in sha1_file.c. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html