Here's a very lightly modified version on v3 of mine and Peff's series to add a new 'git repack --geometric' mode. Almost nothing has changed since last time, with the exception of: - Packs listed over standard input to 'git pack-objects --stdin-packs' are sorted in descending mtime order (and objects are strung together in pack order as before) so that objects are laid out roughly newest-to-oldest in the resulting pack. - Swapped the order of two paragraphs in patch 5 to make the perf results clearer. - Mention '--unpacked' specifically in the documentation for 'git repack --geometric'. - Typo fixes. Range-diff is below. It would be good to start merging this down since we have a release candidate coming up soon, and I'd rather focus future reviewer efforts on the multi-pack reverse index and bitmaps series instead of this one. Jeff King (4): p5303: add missing &&-chains p5303: measure time to repack with keep builtin/pack-objects.c: rewrite honor-pack-keep logic packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau (4): packfile: introduce 'find_kept_pack_entry()' revision: learn '--no-kept-objects' builtin/pack-objects.c: add '--stdin-packs' option builtin/repack.c: add '--geometric' option Documentation/git-pack-objects.txt | 10 + Documentation/git-repack.txt | 23 ++ builtin/pack-objects.c | 333 ++++++++++++++++++++++++----- builtin/repack.c | 187 +++++++++++++++- object-store.h | 5 + packfile.c | 67 ++++++ packfile.h | 5 + revision.c | 15 ++ revision.h | 4 + t/perf/p5303-many-packs.sh | 36 +++- t/t5300-pack-object.sh | 97 +++++++++ t/t6114-keep-packs.sh | 69 ++++++ t/t7703-repack-geometric.sh | 137 ++++++++++++ 13 files changed, 926 insertions(+), 62 deletions(-) create mode 100755 t/t6114-keep-packs.sh create mode 100755 t/t7703-repack-geometric.sh Range-diff against v3: 1: aa94edf39b = 1: bb674e5119 packfile: introduce 'find_kept_pack_entry()' 2: 82f6b45463 = 2: c85a915597 revision: learn '--no-kept-objects' 3: 033e4e3f67 ! 3: 649cf9020b builtin/pack-objects.c: add '--stdin-packs' option @@ builtin/pack-objects.c: static int git_pack_config(const char *k, const char *v, + struct packed_git *a = ((const struct string_list_item*)_a)->util; + struct packed_git *b = ((const struct string_list_item*)_b)->util; + ++ /* ++ * order packs by descending mtime so that objects are laid out ++ * roughly as newest-to-oldest ++ */ + if (a->mtime < b->mtime) -+ return -1; -+ else if (b->mtime < a->mtime) + return 1; ++ else if (b->mtime < a->mtime) ++ return -1; + else + return 0; +} 4: f9a5faf773 = 4: 6de9f0c52b p5303: add missing &&-chains 5: 181c104a03 ! 5: 94e4f3ee3a p5303: measure time to repack with keep @@ Metadata ## Commit message ## p5303: measure time to repack with keep - Add two new tests to measure repack performance. Both test split the + Add two new tests to measure repack performance. Both tests split the repository into synthetic "pushes", and then leave the remaining objects in a big base pack. @@ Commit message 5303.17: repack (1000) 216.87(490.79+14.57) 5303.18: repack with kept (1000) 665.63(938.87+15.76) - Likewise, the scaling is pretty extreme on --stdin-packs: - - 5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00) - 5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24) - 5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10) - That's because the code paths around handling .keep files are known to scale badly; they look in every single pack file to find each object. Our solution to that was to notice that most repos don't have keep @@ Commit message single .keep, that part of pack-objects slows down again (even if we have fewer objects total to look at). + Likewise, the scaling is pretty extreme on --stdin-packs (but each + subsequent test is also being asked to do more work): + + 5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00) + 5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24) + 5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10) + Signed-off-by: Jeff King <peff@xxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> 6: 67af143fd1 = 6: a116587fb2 builtin/pack-objects.c: rewrite honor-pack-keep logic 7: e9e04b95e7 = 7: db9f07ec1a packfile: add kept-pack cache for find_kept_pack_entry() 8: bd492ec142 ! 8: 51f57d5da2 builtin/repack.c: add '--geometric' option @@ Documentation/git-repack.txt: depth is 4095. +packs determined to need to be combined in order to restore a geometric +progression. ++ -+Loose objects are implicitly included in this "roll-up", without respect -+to their reachability. This is subject to change in the future. This -+option (implying a drastically different repack mode) is not guarenteed -+to work with all other combinations of option to `git repack`). ++When `--unpacked` is specified, loose objects are implicitly included in ++this "roll-up", without respect to their reachability. This is subject ++to change in the future. This option (implying a drastically different ++repack mode) is not guaranteed to work with all other combinations of ++option to `git repack`). + Configuration ------------- -- 2.30.0.667.g81c0cbc6fd