This series is based on ds/maintenance-part-1 [2]. This patch series contains 9 patches that were going to be part of v4 of ds/maintenance [1], but the discussion has gotten really long. To help, I'm splitting out the portions that create and test the 'maintenance' builtin from the additional tasks (prefetch, loose-objects, incremental-repack) that can be brought in later. [1] https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@xxxxxxxxx/ [2] https://lore.kernel.org/git/pull.695.v3.git.1598380426.gitgitgadget@xxxxxxxxx/ As detailed in [2], the 'git maintenance run' subcommand will run certain tasks based on config options or the --task= arguments. The --auto option indicates to the task to only run based on some internal check that there has been "enough" change in that domain to merit the work. In the case of the 'gc' task, this also reduces the amount of work done. The new maintenance tasks in this series are: * 'loose-objects' : prune packed loose objects, then create a new pack from a batch of loose objects. * 'pack-files' : expire redundant packs from the multi-pack-index, then repack using the multi-pack-index's incremental repack strategy. * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/ /'. These tasks are all disabled by default, but can be enabled with config options or run explicitly using "git maintenance run --task=". Since [2] replaced the 'git gc --auto' calls with 'git maintenance run --auto' at the end of some Git commands, users could replace the 'gc' task with these lighter-weight changes for foreground maintenance. The 'git maintenance' builtin has a 'run' subcommand so it can be extended later with subcommands that manage background maintenance, such as 'start' or 'stop'. These are not the subject of this series, as it is important to focus on the maintenance activities themselves. I have an RFC series for this available at [3]. [3] https://lore.kernel.org/git/pull.680.git.1597857408.gitgitgadget@xxxxxxxxx/ Updates in v3 ============= * Several commit message, documentation, and test updates from Jonathan Tan's helpful review! Updates since v2 ================ * Dropped "fetch: optionally allow disabling FETCH_HEAD update" * A lot of fallout from the change in the option parsing in v3 of Maintenance II. * Dropped the "verify, and delete and rewrite on failure" logic from the incremental-repack task. This might be added again later after it can be tested more thoroughly. Updates since v1 (of this series) ================================= * PATCH 1 ("fetch: optionally allow disabling FETCH_HEAD update") was rewritten on-list. Getting a version out with this patch is the main reason for rolling a v2. (That, and Part I is re-rolled with a v2 and I want to make sure this series applies cleanly.) * The 'prefetch' and 'loose-objects' tasks had some review, but my proposed changes were not acked, so they may need another review. UPDATES since v3 of [1] ======================= * The biggest change here is the use of "test_subcommand", based on Jonathan Nieder's approach. This requires having the exact command-line figured out, which now requires spelling out all --no- [quiet%7Cprogress] options. I also added a bunch of "2>/dev/null" checks because of the isatty(2) calls. Without that, the behavior will change depending on whether the test is run with -x/-v or without. * The 0x7FFF/0x7FFFFFFF constant problem is fixed with an EXPENSIVE test that verifies it. * The option parsing has changed to use a local struct and pass that struct to the helper methods. This is instead of having a global singleton. Thanks, -Stolee Derrick Stolee (8): maintenance: add prefetch task maintenance: add loose-objects task maintenance: create auto condition for loose-objects midx: enable core.multiPackIndex by default midx: use start_delayed_progress() maintenance: add incremental-repack task maintenance: auto-size incremental-repack batch maintenance: add incremental-repack auto condition Documentation/config/core.txt | 4 +- Documentation/config/maintenance.txt | 18 ++ Documentation/git-maintenance.txt | 48 ++++ builtin/gc.c | 326 +++++++++++++++++++++++++++ midx.c | 21 +- repo-settings.c | 6 + repository.h | 2 + t/t5319-multi-pack-index.sh | 15 +- t/t7900-maintenance.sh | 185 +++++++++++++++ 9 files changed, 603 insertions(+), 22 deletions(-) base-commit: 25914c4fdeefd99b06e134496dfb9bbb58a5c417 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-696%2Fderrickstolee%2Fmaintenance%2Fgc-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-696/derrickstolee/maintenance/gc-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/696 Range-diff vs v3: 1: da64c51a81 ! 1: 7a62e224cf maintenance: add prefetch task @@ Commit message of a foreground fetch to make that 'git fetch' command much faster. However, if we simply ran 'git fetch <remote>' in the background, - then the user running a foregroudn 'git fetch <remote>' would lose + then the user running a foreground 'git fetch <remote>' would lose some important feedback when a new branch appears or an existing branch updates. This is especially true if a remote branch is force-updated and this isn't noticed by the user because it occurred 2: 75e846456b ! 2: f3a16fd324 maintenance: add loose-objects task @@ Commit message objects are created only by a user doing normal development. We noticed users with _millions_ of loose objects because VFS for Git downloads blobs on-demand when a file read operation - requires populating a virtual file. This has potential of - happening in partial clones if someone runs 'git grep' or - otherwise evades the batch-download feature for requesting - promisor objects. + requires populating a virtual file. This step is based on a similar step in Scalar [1] and VFS for Git. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs 3: d6e382c43e ! 3: 931fff4883 maintenance: create auto condition for loose-objects @@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' ' + git -c maintenance.loose-objects.auto=1 maintenance \ + run --auto --task=loose-objects 2>/dev/null && + test_subcommand ! git prune-packed --quiet <trace-lo1.txt && -+ for i in 1 2 -+ do -+ printf data-A-$i | git hash-object -t blob --stdin -w && -+ GIT_TRACE2_EVENT="$(pwd)/trace-loA-$i" \ -+ git -c maintenance.loose-objects.auto=2 \ -+ maintenance run --auto --task=loose-objects 2>/dev/null && -+ test_subcommand ! git prune-packed --quiet <trace-loA-$i && -+ printf data-B-$i | git hash-object -t blob --stdin -w && -+ GIT_TRACE2_EVENT="$(pwd)/trace-loB-$i" \ -+ git -c maintenance.loose-objects.auto=2 \ -+ maintenance run --auto --task=loose-objects 2>/dev/null && -+ test_subcommand git prune-packed --quiet <trace-loB-$i && -+ GIT_TRACE2_EVENT="$(pwd)/trace-loC-$i" \ -+ git -c maintenance.loose-objects.auto=2 \ -+ maintenance run --auto --task=loose-objects 2>/dev/null && -+ test_subcommand git prune-packed --quiet <trace-loC-$i || return 1 -+ done ++ printf data-A | git hash-object -t blob --stdin -w && ++ GIT_TRACE2_EVENT="$(pwd)/trace-loA" \ ++ git -c maintenance.loose-objects.auto=2 \ ++ maintenance run --auto --task=loose-objects 2>/dev/null && ++ test_subcommand ! git prune-packed --quiet <trace-loA && ++ printf data-B | git hash-object -t blob --stdin -w && ++ GIT_TRACE2_EVENT="$(pwd)/trace-loB" \ ++ git -c maintenance.loose-objects.auto=2 \ ++ maintenance run --auto --task=loose-objects 2>/dev/null && ++ test_subcommand git prune-packed --quiet <trace-loB && ++ GIT_TRACE2_EVENT="$(pwd)/trace-loC" \ ++ git -c maintenance.loose-objects.auto=2 \ ++ maintenance run --auto --task=loose-objects 2>/dev/null && ++ test_subcommand git prune-packed --quiet <trace-loC +' + test_done 4: d0f2ec70d9 = 4: 0fe2036aa8 midx: enable core.multiPackIndex by default 5: 2cd3c803d9 = 5: ce435bf784 midx: use start_delayed_progress() 6: 0dd26bb584 ! 6: d934899253 maintenance: add incremental-repack task @@ Documentation/git-maintenance.txt: loose-objects:: + The `incremental-repack` job repacks the object directory + using the `multi-pack-index` feature. In order to prevent race + conditions with concurrent Git commands, it follows a two-step -+ process. First, it deletes any pack-files included in the -+ `multi-pack-index` where none of the objects in the -+ `multi-pack-index` reference those pack-files; this only happens -+ if all objects in the pack-file are also stored in a newer -+ pack-file. Second, it selects a group of pack-files whose "expected -+ size" is below the batch size until the group has total expected -+ size at least the batch size; see the `--batch-size` option for -+ the `repack` subcommand in linkgit:git-multi-pack-index[1]. The -+ default batch-size is zero, which is a special case that attempts -+ to repack all pack-files into a single pack-file. ++ process. First, it calls `git multi-pack-index expire` to delete ++ pack-files unreferenced by the `multi-pack-index` file. Second, it ++ calls `git multi-pack-index repack` to select several small ++ pack-files and repack them into a bigger one, and then update the ++ `multi-pack-index` entries that refer to the small pack-files to ++ refer to the new pack-file. This prepares those small pack-files ++ for deletion upon the next run of `git multi-pack-index expire`. ++ The selection of the small pack-files is such that the expected ++ size of the big pack-file is at least the batch size; see the ++ `--batch-size` option for the `repack` subcommand in ++ linkgit:git-multi-pack-index[1]. The default batch-size is zero, ++ which is a special case that attempts to repack all pack-files ++ into a single pack-file. + OPTIONS ------- --auto:: ## builtin/gc.c ## -@@ - #include "promisor-remote.h" - #include "refs.h" - #include "remote.h" -+#include "midx.h" - - #define FAILED_RUN "failed to run %s" - @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts) return prune_packed(opts) || pack_loose(opts); } @@ builtin/gc.c: static struct maintenance_task tasks[] = { "gc", maintenance_task_gc, - ## midx.c ## -@@ - - #define PACK_EXPIRED UINT_MAX - --static char *get_midx_filename(const char *object_dir) -+char *get_midx_filename(const char *object_dir) - { - return xstrfmt("%s/pack/multi-pack-index", object_dir); - } - - ## midx.h ## -@@ midx.h: struct multi_pack_index { - - #define MIDX_PROGRESS (1 << 0) - -+char *get_midx_filename(const char *object_dir); - struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local); - int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id); - int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); - ## t/t5319-multi-pack-index.sh ## @@ test_description='multi-pack-indexes' @@ t/t7900-maintenance.sh: test_description='git maintenance builtin' test_expect_success 'help text' ' test_expect_code 129 git maintenance -h 2>err && @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.loose-objects.auto' ' - done + test_subcommand git prune-packed --quiet <trace-loC ' +test_expect_success 'incremental-repack task' ' 7: f3b25a9927 = 7: bade7706d5 maintenance: auto-size incremental-repack batch 8: e9bb32f53a ! 8: f660dd1890 maintenance: add incremental-repack auto condition @@ Documentation/config/maintenance.txt: maintenance.loose-objects.auto:: ## builtin/gc.c ## @@ + #include "promisor-remote.h" #include "refs.h" #include "remote.h" - #include "midx.h" +#include "object-store.h" #define FAILED_RUN "failed to run %s" @@ t/t7900-maintenance.sh: test_expect_success EXPENSIVE 'incremental-repack 2g lim + -c maintenance.incremental-repack.auto=1 \ + maintenance run --auto --task=incremental-repack 2>/dev/null && + test_subcommand ! git multi-pack-index write --no-progress <midx-init.txt && -+ for i in 1 2 -+ do -+ test_commit A-$i && -+ git pack-objects --revs .git/objects/pack/pack <<-\EOF && -+ HEAD -+ ^HEAD~1 -+ EOF -+ GIT_TRACE2_EVENT=$(pwd)/trace-A-$i git \ -+ -c maintenance.incremental-repack.auto=2 \ -+ maintenance run --auto --task=incremental-repack 2>/dev/null && -+ test_subcommand ! git multi-pack-index write --no-progress <trace-A-$i && -+ test_commit B-$i && -+ git pack-objects --revs .git/objects/pack/pack <<-\EOF && -+ HEAD -+ ^HEAD~1 -+ EOF -+ GIT_TRACE2_EVENT=$(pwd)/trace-B-$i git \ -+ -c maintenance.incremental-repack.auto=2 \ -+ maintenance run --auto --task=incremental-repack 2>/dev/null && -+ test_subcommand git multi-pack-index write --no-progress <trace-B-$i || return 1 -+ done ++ test_commit A && ++ git pack-objects --revs .git/objects/pack/pack <<-\EOF && ++ HEAD ++ ^HEAD~1 ++ EOF ++ GIT_TRACE2_EVENT=$(pwd)/trace-A git \ ++ -c maintenance.incremental-repack.auto=2 \ ++ maintenance run --auto --task=incremental-repack 2>/dev/null && ++ test_subcommand ! git multi-pack-index write --no-progress <trace-A && ++ test_commit B && ++ git pack-objects --revs .git/objects/pack/pack <<-\EOF && ++ HEAD ++ ^HEAD~1 ++ EOF ++ GIT_TRACE2_EVENT=$(pwd)/trace-B git \ ++ -c maintenance.incremental-repack.auto=2 \ ++ maintenance run --auto --task=incremental-repack 2>/dev/null && ++ test_subcommand git multi-pack-index write --no-progress <trace-B +' + test_done -- gitgitgadget