From: Derrick Stolee <stolee@xxxxxxxxx> Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. In my copy of the Git repository, the new tests in p5313 show these results: Test this tree ------------------------------------------------------------- 5313.10: repack 27.88(150.23+2.70) 5313.11: repack size 228.2M 5313.12: repack with --path-walk 134.59(148.77+0.81) 5313.13: repack size with --path-walk 209.7M Note that the 'git pack-objects --path-walk' feature is not integrated with threads. Look forward to a future change that will introduce threading to improve the time performance of this feature with equivalent space performance. For the microsoft/fluentui repo [1] had some interesting aspects for the previous tests in p5313, so here are the repack results: Test this tree ------------------------------------------------------------- 5313.10: repack 91.76(680.94+2.48) 5313.11: repack size 439.1M 5313.12: repack with --path-walk 110.35(130.46+0.74) 5313.13: repack size with --path-walk 155.3M [1] https://github.com/microsoft/fluentui Here, we see the significant improvement of a full repack using this strategy. The name-hash collisions in this repo cause the space problems. Those collisions also cause the repack command to spend a lot of cycles trying to find delta bases among files that are not actually very similar, so the lack of threading with the --path-walk feature is less pronounced in the process time. For the Linux kernel repository, we have these stats: Test this tree --------------------------------------------------------------- 5313.10: repack 553.61(1929.41+30.31) 5313.11: repack size 2.5G 5313.12: repack with --path-walk 1777.63(2044.16+7.47) 5313.13: repack size with --path-walk 2.5G This demonstrates that the --path-walk feature does not always present measurable improvements, especially in cases where the name-hash has very few collisions. Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx> --- Documentation/git-repack.txt | 17 ++++++++++++++++- builtin/repack.c | 9 ++++++++- t/perf/p5313-pack-objects.sh | 18 ++++++++++++++++++ 3 files changed, 42 insertions(+), 2 deletions(-) diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index c902512a9e8..4ec59cd27b1 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -9,7 +9,9 @@ git-repack - Pack unpacked objects in a repository SYNOPSIS -------- [verse] -'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] [--write-midx] +'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] + [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] + [--write-midx] [--path-walk] DESCRIPTION ----------- @@ -249,6 +251,19 @@ linkgit:git-multi-pack-index[1]). Write a multi-pack index (see linkgit:git-multi-pack-index[1]) containing the non-redundant packs. +--path-walk:: + This option passes the `--path-walk` option to the underlying + `git pack-options` process (see linkgit:git-pack-objects[1]). + By default, `git pack-objects` walks objects in an order that + presents trees and blobs in an order unrelated to the path they + appear relative to a commit's root tree. The `--path-walk` option + enables a different walking algorithm that organizes trees and + blobs by path. This has the potential to improve delta compression + especially in the presence of filenames that cause collisions in + Git's default name-hash algorithm. Due to changing how the objects + are walked, this option is not compatible with `--delta-islands` + or `--filter`. + CONFIGURATION ------------- diff --git a/builtin/repack.c b/builtin/repack.c index cb4420f0856..af3f218ced7 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -39,7 +39,9 @@ static int run_update_server_info = 1; static char *packdir, *packtmp_name, *packtmp; static const char *const git_repack_usage[] = { - N_("git repack [<options>]"), + N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n" + "[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]\n" + "[--write-midx] [--full-path-walk]"), NULL }; @@ -58,6 +60,7 @@ struct pack_objects_args { int no_reuse_object; int quiet; int local; + int path_walk; struct list_objects_filter_options filter_options; }; @@ -289,6 +292,8 @@ static void prepare_pack_objects(struct child_process *cmd, strvec_pushf(&cmd->args, "--no-reuse-delta"); if (args->no_reuse_object) strvec_pushf(&cmd->args, "--no-reuse-object"); + if (args->path_walk) + strvec_pushf(&cmd->args, "--path-walk"); if (args->local) strvec_push(&cmd->args, "--local"); if (args->quiet) @@ -1182,6 +1187,8 @@ int cmd_repack(int argc, N_("pass --no-reuse-delta to git-pack-objects")), OPT_BOOL('F', NULL, &po_args.no_reuse_object, N_("pass --no-reuse-object to git-pack-objects")), + OPT_BOOL(0, "path-walk", &po_args.path_walk, + N_("pass --path-walk to git-pack-objects")), OPT_NEGBIT('n', NULL, &run_update_server_info, N_("do not run git-update-server-info"), 1), OPT__QUIET(&po_args.quiet, N_("be quiet")), diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh index 840075f5691..b588066ddb0 100755 --- a/t/perf/p5313-pack-objects.sh +++ b/t/perf/p5313-pack-objects.sh @@ -56,4 +56,22 @@ test_size 'big pack size with --path-walk' ' test_file_size out ' +test_perf 'repack' ' + git repack -adf +' + +test_size 'repack size' ' + pack=$(ls .git/objects/pack/pack-*.pack) && + test_file_size "$pack" +' + +test_perf 'repack with --path-walk' ' + git repack -adf --path-walk +' + +test_size 'repack size with --path-walk' ' + pack=$(ls .git/objects/pack/pack-*.pack) && + test_file_size "$pack" +' + test_done -- gitgitgadget