Jeff King <peff@xxxxxxxx> wrote: >On Sat, Jan 26, 2013 at 10:32:42PM -0800, Junio C Hamano wrote: > >> Both makes sense to me. >> >> I also wonder if we would be helped by another "repack" mode that >> coalesces small packs into a single one with minimum overhead, and >> run that often from "gc --auto", so that we do not end up having to >> have 50 packfiles. >> >> When we have 2 or more small and young packs, we could: >> >> - iterate over idx files for these packs to enumerate the objects >> to be packed, replacing read_object_list_from_stdin() step; >> >> - always choose to copy the data we have in these existing packs, >> instead of doing a full prepare_pack(); and >> >> - use the order the objects appear in the original packs, bypassing >> compute_write_order(). > >I'm not sure. If I understand you correctly, it would basically just be >concatenating packs without trying to do delta compression between the >objects which are ending up in the same pack. So it would save us from >having to do (up to) 50 binary searches to find an object in a pack, >but >would not actually save us much space. > >I would be interested to see the timing on how quick it is compared to >a >real repack, as the I/O that happens during a repack is non-trivial >(although if you are leaving aside the big "main" pack, then it is >probably not bad). > >But how do these somewhat mediocre concatenated packs get turned into >real packs? Pack-objects does not consider deltas between objects in >the >same pack. And when would you decide to make a real pack? How do you >know you have 50 young and small packs, and not 50 mediocre coalesced >packs? If we are reconsidering repacking strategies, I would like to propose an approach that might be a more general improvement to repacking which would help in more situations. You could roll together any packs which are close in size, say within 50% of each other. With this strategy you will end up with files which are spread out by size exponentially. I implementated this strategy on top of the current gc script using keep files, it works fairly well: https://gerrit-review.googlesource.com/#/c/35215/3/contrib/git-exproll.sh This saves some time, but mostly it saves I/O when repacking regularly. I suspect that if this strategy were used in core git that further optimizations could be made to also reduce the repack time, but I don't know enough about repacking to know? We run it nightly on our servers, both write and read only mirrors. We us are a ratio of 5 currently to drastically reduce large repack file rollovers, -Martin -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html