On Tue, May 01, 2012 at 11:47:26AM -0700, Junio C Hamano wrote: > > While keeping the size comparison commented out, you could try to > > replace this line with: > > > > return b < a ? -1 : (b > a); > > > > If this doesn't improve things then it would be clear that this avenue > > should be abandoned. > > Very interesting. The difference between the two should only matter if > there are many blobs with exactly the same size, and most of them delta > horribly with each other. Does the problematic repository exhibit such > a characteristic? No. Here are the objects with the same sizes: $ git rev-list --objects --all | cut -d' ' -f1 | git cat-file --batch-check | cut -d' ' -f2,3 | sort | uniq -c | sort -rn | head 19722 tree 2222 14068 tree 4393 11418 tree 2156 9994 tree 4676 9479 tree 2189 7944 tree 2255 6454 commit 251 6437 tree 4611 5328 tree 4439 4586 commit 254 So it's mostly trees and commits (the first repeated blob size is on line 332 of the output). The commits aren't all that big even without deltafication, but the trees are. They should be sorted by name_hash, but within a single name, there are going to be a lot of repetitions (I think each of those size clusters is just a repetition of the same "po" directory getting lots of tiny modifications). So we are triggering that part of the sort quite a bit. But by your reasoning here: > The original tie-breaks based on the address (the earlier object we read > in the original input comes earlier in the output) and yours make the > objects later we read (which in turn are from older parts of the history) > come early, but adjacency between two objects of the same type and the > same size would not change (if A and B were next to each other in this > order, your updated sorter will give B and then A still next to each > other), so I suspect not much would change in the candidate selection. I don't think it makes a big difference (and indeed, switching it and repacking the phpmyadmin repository yields the same-size pack, although a lot more CPU time is spent). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html