This series is based on part I [2]. This patch series contains 9 patches that were going to be part of v4 of ds/maintenance [1], but the discussion has gotten really long. To help, I'm splitting out the portions that create and test the 'maintenance' builtin from the additional tasks (prefetch, loose-objects, incremental-repack) that can be brought in later. [1] https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@xxxxxxxxx/ [2] https://lore.kernel.org/git/pull.695.git.1596728921.gitgitgadget@xxxxxxxxx/ As detailed in [2], the 'git maintenance run' subcommand will run certain tasks based on config options or the --task= arguments. The --auto option indicates to the task to only run based on some internal check that there has been "enough" change in that domain to merit the work. In the case of the 'gc' task, this also reduces the amount of work done. The new maintenance tasks in this series are: * 'loose-objects' : prune packed loose objects, then create a new pack from a batch of loose objects. * 'pack-files' : expire redundant packs from the multi-pack-index, then repack using the multi-pack-index's incremental repack strategy. * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/ /'. These tasks are all disabled by default, but can be enabled with config options or run explicitly using "git maintenance run --task=". Since [2] replaced the 'git gc --auto' calls with 'git maintenance run --auto' at the end of some Git commands, users could replace the 'gc' task with these lighter-weight changes for foreground maintenance. The 'git maintenance' builtin has a 'run' subcommand so it can be extended later with subcommands that manage background maintenance, such as 'start' or 'stop'. These are not the subject of this series, as it is important to focus on the maintenance activities themselves. I have a WIP series for this available at [3]. [3] https://github.com/gitgitgadget/git/pull/680 UPDATES since v3 of [1] ======================= * The biggest change here is the use of "test_subcommand", based on Jonathan Nieder's approach. This requires having the exact command-line figured out, which now requires spelling out all --no- [quiet%7Cprogress] options. I also added a bunch of "2>/dev/null" checks because of the isatty(2) calls. Without that, the behavior will change depending on whether the test is run with -x/-v or without. * The 0x7FFF/0x7FFFFFFF constant problem is fixed with an EXPENSIVE test that verifies it. * The option parsing has changed to use a local struct and pass that struct to the helper methods. This is instead of having a global singleton. Thanks, -Stolee Here is the range-diff from the v3 of [1]. 1: 12fe73bb72 < -: ---------- maintenance: create basic maintenance runner 2: 6e533e43d7 < -: ---------- maintenance: add --quiet option 3: c4674fc211 < -: ---------- maintenance: replace run_auto_gc() 4: b9332c1318 < -: ---------- maintenance: initialize task array 5: a4d9836bed < -: ---------- maintenance: add commit-graph task 6: dafb0d9bbc < -: ---------- maintenance: add --task option 7: 1b00524da3 < -: ---------- maintenance: take a lock on the objects directory 8: 0e94e04dcd = 1: 83401c5200 fetch: optionally allow disabling FETCH_HEAD update 9: 9e38ade15c ! 2: 85118ed5f1 maintenance: add prefetch task @@ Documentation/git-maintenance.txt: since it will not expire `.graph` files that ## builtin/gc.c ## @@ - #include "blob.h" #include "tree.h" #include "promisor-remote.h" + #include "refs.h" +#include "remote.h" #define FAILED_RUN "failed to run %s" -@@ builtin/gc.c: static int maintenance_task_commit_graph(void) +@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *opts) return 1; } -+static int fetch_remote(const char *remote) ++static int fetch_remote(const char *remote, struct maintenance_opts *opts) +{ + struct child_process child = CHILD_PROCESS_INIT; + @@ builtin/gc.c: static int maintenance_task_commit_graph(void) + strvec_pushl(&child.args, "fetch", remote, "--prune", "--no-tags", + "--no-write-fetch-head", "--refmap=", NULL); + -+ strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote); -+ -+ if (opts.quiet) ++ if (opts->quiet) + strvec_push(&child.args, "--quiet"); + ++ strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote); ++ + return !!run_command(&child); +} + @@ builtin/gc.c: static int maintenance_task_commit_graph(void) + return 0; +} + -+static int maintenance_task_prefetch(void) ++static int maintenance_task_prefetch(struct maintenance_opts *opts) +{ + int result = 0; + struct string_list_item *item; @@ builtin/gc.c: static int maintenance_task_commit_graph(void) + for (item = remotes.items; + item && item < remotes.items + remotes.nr; + item++) -+ result |= fetch_remote(item->string); ++ result |= fetch_remote(item->string, opts); + +cleanup: + string_list_clear(&remotes, 0); + return result; +} + - static int maintenance_task_gc(void) + static int maintenance_task_gc(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ builtin/gc.c: struct maintenance_task { @@ t/t7900-maintenance.sh: test_expect_success 'run --task duplicate' ' + git -C clone2 switch -c two && + test_commit -C clone1 one && + test_commit -C clone2 two && -+ GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch && -+ grep ",\"fetch\",\"remote1\"" run-prefetch.txt && -+ grep ",\"fetch\",\"remote2\"" run-prefetch.txt && ++ GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null && ++ fetchargs="--prune --no-tags --no-write-fetch-head --refmap= --quiet" && ++ test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt && ++ test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt && + test_path_is_missing .git/refs/remotes && + test_cmp clone1/.git/refs/heads/one .git/refs/prefetch/remote1/one && + test_cmp clone2/.git/refs/heads/two .git/refs/prefetch/remote2/two && 10: 0128fdfd1a ! 3: 621375a3c9 maintenance: add loose-objects task @@ Documentation/git-maintenance.txt: gc:: --auto:: ## builtin/gc.c ## -@@ builtin/gc.c: static int maintenance_task_gc(void) +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts) return run_command(&child); } -+static int prune_packed(void) ++static int prune_packed(struct maintenance_opts *opts) +{ + struct child_process child = CHILD_PROCESS_INIT; + + child.git_cmd = 1; + strvec_push(&child.args, "prune-packed"); + -+ if (opts.quiet) ++ if (opts->quiet) + strvec_push(&child.args, "--quiet"); + + return !!run_command(&child); @@ builtin/gc.c: static int maintenance_task_gc(void) + return ++(d->count) > d->batch_size; +} + -+static int pack_loose(void) ++static int pack_loose(struct maintenance_opts *opts) +{ + struct repository *r = the_repository; + int result = 0; @@ builtin/gc.c: static int maintenance_task_gc(void) + pack_proc.git_cmd = 1; + + strvec_push(&pack_proc.args, "pack-objects"); -+ if (opts.quiet) ++ if (opts->quiet) + strvec_push(&pack_proc.args, "--quiet"); + strvec_pushf(&pack_proc.args, "%s/pack/loose", r->objects->odb->path); + @@ builtin/gc.c: static int maintenance_task_gc(void) + return result; +} + -+static int maintenance_task_loose_objects(void) ++static int maintenance_task_loose_objects(struct maintenance_opts *opts) +{ -+ return prune_packed() || pack_loose(); ++ return prune_packed(opts) || pack_loose(opts); +} + - typedef int maintenance_task_fn(void); + typedef int maintenance_task_fn(struct maintenance_opts *opts); - struct maintenance_task { + /* @@ builtin/gc.c: struct maintenance_task { enum maintenance_task_label { 17: 6ac3a58f2f ! 4: e787403ea7 maintenance: create auto condition for loose-objects @@ builtin/gc.c: static struct maintenance_task tasks[] = { maintenance_task_loose_objects, + loose_object_auto_condition, }, - [TASK_INCREMENTAL_REPACK] = { - "incremental-repack", + [TASK_GC] = { + "gc", ## t/t7900-maintenance.sh ## @@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' ' @@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' ' + git repack -adk && + GIT_TRACE2_EVENT="$(pwd)/trace-lo1.txt" \ + git -c maintenance.loose-objects.auto=1 maintenance \ -+ run --auto --task=loose-objects && -+ ! grep "\"prune-packed\"" trace-lo1.txt && ++ run --auto --task=loose-objects 2>/dev/null && ++ test_subcommand ! git prune-packed --quiet <trace-lo1.txt && + for i in 1 2 + do + printf data-A-$i | git hash-object -t blob --stdin -w && + GIT_TRACE2_EVENT="$(pwd)/trace-loA-$i" \ + git -c maintenance.loose-objects.auto=2 \ -+ maintenance run --auto --task=loose-objects && -+ ! grep "\"prune-packed\"" trace-loA-$i && ++ maintenance run --auto --task=loose-objects 2>/dev/null && ++ test_subcommand ! git prune-packed --quiet <trace-loA-$i && + printf data-B-$i | git hash-object -t blob --stdin -w && + GIT_TRACE2_EVENT="$(pwd)/trace-loB-$i" \ + git -c maintenance.loose-objects.auto=2 \ -+ maintenance run --auto --task=loose-objects && -+ grep "\"prune-packed\"" trace-loB-$i && ++ maintenance run --auto --task=loose-objects 2>/dev/null && ++ test_subcommand git prune-packed --quiet <trace-loB-$i && + GIT_TRACE2_EVENT="$(pwd)/trace-loC-$i" \ + git -c maintenance.loose-objects.auto=2 \ -+ maintenance run --auto --task=loose-objects && -+ grep "\"prune-packed\"" trace-loC-$i || return 1 ++ maintenance run --auto --task=loose-objects 2>/dev/null && ++ test_subcommand git prune-packed --quiet <trace-loC-$i || return 1 + done +' + - test_expect_success 'incremental-repack task' ' - packDir=.git/objects/pack && - for i in $(test_seq 1 5) + test_done 11: c2baf6e119 = 5: 37e59b1a8d midx: enable core.multiPackIndex by default 19: 9b4cef7635 = 6: aba087f663 midx: use start_delayed_progress() 12: 00f47c4848 ! 7: 68727c555b maintenance: add incremental-repack task @@ Documentation/git-maintenance.txt: loose-objects:: ## builtin/gc.c ## @@ - #include "tree.h" #include "promisor-remote.h" + #include "refs.h" #include "remote.h" +#include "midx.h" #define FAILED_RUN "failed to run %s" -@@ builtin/gc.c: static int maintenance_task_loose_objects(void) - return prune_packed() || pack_loose(); +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts) + return prune_packed(opts) || pack_loose(opts); } -+static int multi_pack_index_write(void) ++static int multi_pack_index_write(struct maintenance_opts *opts) +{ + struct child_process child = CHILD_PROCESS_INIT; + + child.git_cmd = 1; + strvec_pushl(&child.args, "multi-pack-index", "write", NULL); + -+ if (opts.quiet) ++ if (opts->quiet) + strvec_push(&child.args, "--no-progress"); + + if (run_command(&child)) @@ builtin/gc.c: static int maintenance_task_loose_objects(void) + return 0; +} + -+static int rewrite_multi_pack_index(void) ++static int rewrite_multi_pack_index(struct maintenance_opts *opts) +{ + struct repository *r = the_repository; + char *midx_name = get_midx_filename(r->objects->odb->path); @@ builtin/gc.c: static int maintenance_task_loose_objects(void) + unlink(midx_name); + free(midx_name); + -+ return multi_pack_index_write(); ++ return multi_pack_index_write(opts); +} + -+static int multi_pack_index_verify(const char *message) ++static int multi_pack_index_verify(struct maintenance_opts *opts, ++ const char *message) +{ + struct child_process child = CHILD_PROCESS_INIT; + + child.git_cmd = 1; + strvec_pushl(&child.args, "multi-pack-index", "verify", NULL); + -+ if (opts.quiet) ++ if (opts->quiet) + strvec_push(&child.args, "--no-progress"); + + if (run_command(&child)) { @@ builtin/gc.c: static int maintenance_task_loose_objects(void) + return 0; +} + -+static int multi_pack_index_expire(void) ++static int multi_pack_index_expire(struct maintenance_opts *opts) +{ + struct child_process child = CHILD_PROCESS_INIT; + + child.git_cmd = 1; + strvec_pushl(&child.args, "multi-pack-index", "expire", NULL); + -+ if (opts.quiet) ++ if (opts->quiet) + strvec_push(&child.args, "--no-progress"); + + close_object_store(the_repository->objects); @@ builtin/gc.c: static int maintenance_task_loose_objects(void) + return 0; +} + -+static int multi_pack_index_repack(void) ++static int multi_pack_index_repack(struct maintenance_opts *opts) +{ + struct child_process child = CHILD_PROCESS_INIT; + + child.git_cmd = 1; + strvec_pushl(&child.args, "multi-pack-index", "repack", NULL); + -+ if (opts.quiet) ++ if (opts->quiet) + strvec_push(&child.args, "--no-progress"); + + strvec_push(&child.args, "--batch-size=0"); @@ builtin/gc.c: static int maintenance_task_loose_objects(void) + return 0; +} + -+static int maintenance_task_incremental_repack(void) ++static int maintenance_task_incremental_repack(struct maintenance_opts *opts) +{ + prepare_repo_settings(the_repository); + if (!the_repository->settings.core_multi_pack_index) { @@ builtin/gc.c: static int maintenance_task_loose_objects(void) + return 0; + } + -+ if (multi_pack_index_write()) ++ if (multi_pack_index_write(opts)) + return 1; -+ if (multi_pack_index_verify("after initial write")) -+ return rewrite_multi_pack_index(); -+ if (multi_pack_index_expire()) ++ if (multi_pack_index_verify(opts, "after initial write")) ++ return rewrite_multi_pack_index(opts); ++ if (multi_pack_index_expire(opts)) + return 1; -+ if (multi_pack_index_verify("after expire step")) -+ return !!rewrite_multi_pack_index(); -+ if (multi_pack_index_repack()) ++ if (multi_pack_index_verify(opts, "after expire step")) ++ return !!rewrite_multi_pack_index(opts); ++ if (multi_pack_index_repack(opts)) + return 1; -+ if (multi_pack_index_verify("after repack step")) -+ return !!rewrite_multi_pack_index(); ++ if (multi_pack_index_verify(opts, "after repack step")) ++ return !!rewrite_multi_pack_index(opts); + return 0; +} + - typedef int maintenance_task_fn(void); + typedef int maintenance_task_fn(struct maintenance_opts *opts); - struct maintenance_task { + /* @@ builtin/gc.c: struct maintenance_task { enum maintenance_task_label { TASK_PREFETCH, @@ builtin/gc.c: struct maintenance_task { TASK_COMMIT_GRAPH, @@ builtin/gc.c: static struct maintenance_task tasks[] = { - "loose-objects", maintenance_task_loose_objects, + loose_object_auto_condition, }, + [TASK_INCREMENTAL_REPACK] = { + "incremental-repack", @@ t/t7900-maintenance.sh: test_description='git maintenance builtin' test_expect_success 'help text' ' test_expect_code 129 git maintenance -h 2>err && -@@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' ' - test_cmp packs-between packs-after +@@ t/t7900-maintenance.sh: test_expect_success 'maintenance.loose-objects.auto' ' + done ' +test_expect_success 'incremental-repack task' ' 13: ef2a231956 ! 8: c3487fb8e3 maintenance: auto-size incremental-repack batch @@ Commit message truly want to optimize for space and performance (and are willing to pay the upfront cost of a full repack) can use the 'gc' task to do so. + Create a test for this two gigabyte limit by creating an EXPENSIVE test + that generates two pack-files of roughly 2.5 gigabytes in size, then + performs an incremental repack. Check that the --batch-size argument in + the subcommand uses the hard-coded maximum. + + Helped-by: Chris Torek <chris.torek@xxxxxxxxx> Reported-by: Son Luong Ngoc <sluongng@xxxxxxxxx> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> ## builtin/gc.c ## -@@ builtin/gc.c: static int multi_pack_index_expire(void) +@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts) return 0; } -+#define TWO_GIGABYTES (0x7FFF) ++#define TWO_GIGABYTES (INT32_MAX) + +static off_t get_auto_pack_size(void) +{ @@ builtin/gc.c: static int multi_pack_index_expire(void) + return result_size; +} + - static int multi_pack_index_repack(void) + static int multi_pack_index_repack(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; -@@ builtin/gc.c: static int multi_pack_index_repack(void) - if (opts.quiet) +@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_opts *opts) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); - strvec_push(&child.args, "--batch-size=0"); @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' ' ls .git/objects/pack/*.pack >packs-after && - test_line_count = 1 packs-after + test_line_count = 2 packs-after ++' ++ ++test_expect_success EXPENSIVE 'incremental-repack 2g limit' ' ++ for i in $(test_seq 1 5) ++ do ++ test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big || ++ return 1 ++ done && ++ git add big && ++ git commit -m "Add big file (1)" && ++ ++ # ensure any possible loose objects are in a pack-file ++ git maintenance run --task=loose-objects && ++ ++ rm big && ++ for i in $(test_seq 6 10) ++ do ++ test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big || ++ return 1 ++ done && ++ git add big && ++ git commit -m "Add big file (2)" && ++ ++ # ensure any possible loose objects are in a pack-file ++ git maintenance run --task=loose-objects && ++ ++ # Now run the incremental-repack task and check the batch-size ++ GIT_TRACE2_EVENT="$(pwd)/run-2g.txt" git maintenance run \ ++ --task=incremental-repack 2>/dev/null && ++ test_subcommand git multi-pack-index repack \ ++ --no-progress --batch-size=2147483647 <run-2g.txt ' test_done 14: 99840c4b8f < -: ---------- maintenance: create maintenance.<task>.enabled config 15: a087c63572 < -: ---------- maintenance: use pointers to check --auto 16: ef3a854508 < -: ---------- maintenance: add auto condition for commit-graph task 18: 801b262d1c ! 9: 407c123c51 maintenance: add incremental-repack auto condition @@ Documentation/config/maintenance.txt: maintenance.loose-objects.auto:: ## builtin/gc.c ## @@ + #include "refs.h" #include "remote.h" #include "midx.h" - #include "refs.h" +#include "object-store.h" #define FAILED_RUN "failed to run %s" -@@ builtin/gc.c: static int maintenance_task_loose_objects(void) - return prune_packed() || pack_loose(); +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts) + return prune_packed(opts) || pack_loose(opts); } +static int incremental_repack_auto_condition(void) @@ builtin/gc.c: static int maintenance_task_loose_objects(void) + return count >= incremental_repack_auto_limit; +} + - static int multi_pack_index_write(void) + static int multi_pack_index_write(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ builtin/gc.c: static struct maintenance_task tasks[] = { @@ builtin/gc.c: static struct maintenance_task tasks[] = { ## t/t7900-maintenance.sh ## @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' ' - test_line_count = 2 packs-after + ' + + test_expect_success EXPENSIVE 'incremental-repack 2g limit' ' ++ + for i in $(test_seq 1 5) + do + test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big || +@@ t/t7900-maintenance.sh: test_expect_success EXPENSIVE 'incremental-repack 2g limit' ' + --no-progress --batch-size=2147483647 <run-2g.txt ' +test_expect_success 'maintenance.incremental-repack.auto' ' + git repack -adk && + git config core.multiPackIndex true && + git multi-pack-index write && -+ GIT_TRACE2_EVENT=1 git -c maintenance.incremental-repack.auto=1 \ -+ maintenance run --auto --task=incremental-repack >out && -+ ! grep "\"multi-pack-index\"" out && ++ GIT_TRACE2_EVENT="$(pwd)/midx-init.txt" git \ ++ -c maintenance.incremental-repack.auto=1 \ ++ maintenance run --auto --task=incremental-repack 2>/dev/null && ++ test_subcommand ! git multi-pack-index write --no-progress <midx-init.txt && + for i in 1 2 + do + test_commit A-$i && @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' ' + EOF + GIT_TRACE2_EVENT=$(pwd)/trace-A-$i git \ + -c maintenance.incremental-repack.auto=2 \ -+ maintenance run --auto --task=incremental-repack && -+ ! grep "\"multi-pack-index\"" trace-A-$i && ++ maintenance run --auto --task=incremental-repack 2>/dev/null && ++ test_subcommand ! git multi-pack-index write --no-progress <trace-A-$i && + test_commit B-$i && + git pack-objects --revs .git/objects/pack/pack <<-\EOF && + HEAD @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' ' + EOF + GIT_TRACE2_EVENT=$(pwd)/trace-B-$i git \ + -c maintenance.incremental-repack.auto=2 \ -+ maintenance run --auto --task=incremental-repack >out && -+ grep "\"multi-pack-index\"" trace-B-$i >/dev/null || return 1 ++ maintenance run --auto --task=incremental-repack 2>/dev/null && ++ test_subcommand git multi-pack-index write --no-progress <trace-B-$i || return 1 + done +' + 20: 39eb83ad1e < -: ---------- maintenance: add trace2 regions for task execution Derrick Stolee (8): maintenance: add prefetch task maintenance: add loose-objects task maintenance: create auto condition for loose-objects midx: enable core.multiPackIndex by default midx: use start_delayed_progress() maintenance: add incremental-repack task maintenance: auto-size incremental-repack batch maintenance: add incremental-repack auto condition Junio C Hamano (1): fetch: optionally allow disabling FETCH_HEAD update Documentation/config/core.txt | 4 +- Documentation/config/fetch.txt | 7 + Documentation/config/maintenance.txt | 18 ++ Documentation/fetch-options.txt | 10 + Documentation/git-maintenance.txt | 41 +++ builtin/fetch.c | 19 +- builtin/gc.c | 364 +++++++++++++++++++++++++++ builtin/pull.c | 3 +- midx.c | 23 +- midx.h | 1 + repo-settings.c | 6 + repository.h | 2 + t/t5319-multi-pack-index.sh | 15 +- t/t5510-fetch.sh | 39 ++- t/t5521-pull-options.sh | 16 ++ t/t7900-maintenance.sh | 191 ++++++++++++++ 16 files changed, 730 insertions(+), 29 deletions(-) base-commit: a5d19148460decaf08e0e6293e996d42ff3f2d32 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-696%2Fderrickstolee%2Fmaintenance%2Fgc-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-696/derrickstolee/maintenance/gc-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/696 -- gitgitgadget