This is a second attempt at redesigning Git's repository maintenance patterns. The first attempt [1] included a way to run jobs in the background using a long-lived process; that idea was rejected and is not included in this series. A future series will use the OS to handle scheduling tasks. [1] https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@xxxxxxxxx/ As mentioned before, git gc already plays the role of maintaining Git repositories. It has accumulated several smaller pieces in its long history, including: 1. Repacking all reachable objects into one pack-file (and deleting unreachable objects). 2. Packing refs. 3. Expiring reflogs. 4. Clearing rerere logs. 5. Updating the commit-graph file. While expiring reflogs, clearing rererelogs, and deleting unreachable objects are suitable under the guise of "garbage collection", packing refs and updating the commit-graph file are not as obviously fitting. Further, these operations are "all or nothing" in that they rewrite almost all repository data, which does not perform well at extremely large scales. These operations can also be disruptive to foreground Git commands when git gc --auto triggers during routine use. This series does not intend to change what git gc does, but instead create new choices for automatic maintenance activities, of which git gc remains the only one enabled by default. The new maintenance tasks are: * 'commit-graph' : write and verify a single layer of an incremental commit-graph. * 'loose-objects' : prune packed loose objects, then create a new pack from a batch of loose objects. * 'pack-files' : expire redundant packs from the multi-pack-index, then repack using the multi-pack-index's incremental repack strategy. * 'fetch' : fetch from each remote, storing the refs in 'refs/hidden//'. These tasks are all disabled by default, but can be enabled with config options or run explicitly using "git maintenance run --task=". There are additional config options to allow customizing the conditions for which the tasks run during the '--auto' option. ('fetch' will never run with the '--auto' option.) Because 'gc' is implemented as a maintenance task, the most dramatic change of this series is to convert the 'git gc --auto' calls into 'git maintenance run --auto' calls at the end of some Git commands. By default, the only change is that 'git gc --auto' will be run below an additional 'git maintenance' process. The 'git maintenance' builtin has a 'run' subcommand so it can be extended later with subcommands that manage background maintenance, such as 'start', 'stop', 'pause', or 'schedule'. These are not the subject of this series, as it is important to focus on the maintenance activities themselves. An expert user could set up scheduled background maintenance themselves with the current series. I have the following crontab data set up to run maintenance on an hourly basis: 0 * * * * git -C /<path-to-repo> maintenance run --no-quiet >>/<path-to-repo>/.git/maintenance.log My config includes all tasks except the 'gc' task. The hourly run is over-aggressive, but is sufficient for testing. I'll replace it with daily when I feel satisfied. Hopefully this direction is seen as a positive one. My goal was to add more options for expert users, along with the flexibility to create background maintenance via the OS in a later series. OUTLINE ======= Patches 1-4 remove some references to the_repository in builtin/gc.c before we start depending on code in that builtin. Patches 5-7 create the 'git maintenance run' builtin and subcommand as a simple shim over 'git gc' and replaces calls to 'git gc --auto' from other commands. Patches 8-15 create new maintenance tasks. These are the same tasks sent in the previous RFC. Patches 16-21 create more customization through config and perform other polish items. FUTURE WORK =========== * Add 'start', 'stop', and 'schedule' subcommands to initialize the commands run in the background. * Split the 'gc' builtin into smaller maintenance tasks that are enabled by default, but might have different '--auto' conditions and more config options. * Replace config like 'gc.writeCommitGraph' and 'fetch.writeCommitGraph' with use of the 'commit-graph' task. Thanks, -Stolee Derrick Stolee (21): gc: use the_repository less often gc: use repository in too_many_loose_objects() gc: use repo config gc: drop the_repository in log location maintenance: create basic maintenance runner maintenance: add --quiet option maintenance: replace run_auto_gc() maintenance: initialize task array and hashmap maintenance: add commit-graph task maintenance: add --task option maintenance: take a lock on the objects directory maintenance: add fetch task maintenance: add loose-objects task maintenance: add pack-files task maintenance: auto-size pack-files batch maintenance: create maintenance.<task>.enabled config maintenance: use pointers to check --auto maintenance: add auto condition for commit-graph task maintenance: create auto condition for loose-objects maintenance: add pack-files auto condition midx: use start_delayed_progress() .gitignore | 1 + Documentation/config.txt | 2 + Documentation/config/maintenance.txt | 32 + Documentation/fetch-options.txt | 5 +- Documentation/git-clone.txt | 7 +- Documentation/git-maintenance.txt | 124 ++++ builtin.h | 1 + builtin/am.c | 2 +- builtin/commit.c | 2 +- builtin/fetch.c | 6 +- builtin/gc.c | 881 +++++++++++++++++++++++++-- builtin/merge.c | 2 +- builtin/rebase.c | 4 +- commit-graph.c | 8 +- commit-graph.h | 1 + config.c | 24 +- config.h | 2 + git.c | 1 + midx.c | 12 +- midx.h | 1 + object.h | 1 + run-command.c | 7 +- run-command.h | 2 +- t/t5319-multi-pack-index.sh | 14 +- t/t5510-fetch.sh | 2 +- t/t5514-fetch-multiple.sh | 2 +- t/t7900-maintenance.sh | 211 +++++++ 27 files changed, 1265 insertions(+), 92 deletions(-) create mode 100644 Documentation/config/maintenance.txt create mode 100644 Documentation/git-maintenance.txt create mode 100755 t/t7900-maintenance.sh base-commit: 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-671%2Fderrickstolee%2Fmaintenance%2Fgc-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-671/derrickstolee/maintenance/gc-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/671 -- gitgitgadget