On 4/5/2020 3:21 PM, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > >> Hi Stolee >> >> On 03/04/2020 21:48, Derrick Stolee via GitGitGadget wrote: >>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >>> >>> The 'git run-job' command will be used to execute a short-lived set >>> of maintenance activities by a background job manager. The intention >>> is to perform small batches of work that reduce the foreground time >>> taken by repository maintenance such as 'git gc --auto'. >>> >>> This change does the absolute minimum to create the builtin and show >>> the usage output. >>> >>> Provide an explicit warning that this command is experimental. The >>> set of jobs may change, and each job could alter its behavior in >>> future versions. >>> >>> RFC QUESTION: This builtin is based on the background maintenance in >>> Scalar. Specifically, this builtin is based on the "scalar run <job>" >>> command [1] [2]. My default thought was to make this a "git run <job>" >>> command to maximize similarity. However, it seems like "git run" is >>> too generic. Or, am I being overly verbose for no reason? >> >> Having read through this series I wondered if we wanted a single git >> command such as 'git maintenance' (suggestions of better names >> welcome) and then 'git run-job' could become 'git maintenance run', >> 'git job-runner' would become another subcommand (run-jobs or >> schedule-jobs?) and the 'git please-run-maintenance-on-this-repo' you >> mentioned in you email to Junio could become 'git maintenance init' >> (or maybe setup) > > I had a very similar impression. In addition to what you already > said, a few more were: > > - Why the existing "git repack" isn't such "maintenance" command? > IOW why do we even need [01/15]? After all, "repack" may have > started its life as a tool to reorganize the PACKFILES, but it is > no longer limited to 'git/objects/pack/*.pack' files with its > knowledge about the loose object files and the "--prune" option. > Consolidating pieces of information spread across multiple .idx > files, reachability bitmaps and commit graph files, into a newer > and more performant forms can just be part of "packing the pieces > of information in a repository for optimum performance", which is > a better way to understand why "repack" has a word 'pack' in its > name. To me, "git repack" is a specific kind of maintenance. The end result is a pack-file. Now, "git gc" is a bit more general, because it will create a pack-file but also update the commit-graph file. Still, its name is still very specific: it "collects garbage". The goals of this series are to replace "git gc --auto" with something less invasive. I'll include an alternate CLI proposal at the end of this message. > - Many of the "maintenance" operations this series proposes do make > sense, just like other "maintenance" operations we already have > in "repack", "prune", "prune-packed" etc., which are welcome > additions. Thanks. I'm glad these steps make sense. They are definitely more "incremental" updates than a full repack or GC. > - Like the individual steps that appear in e.g. "repack", however, > some of the individual steps in this series can be triggered by > calling underlying tools directly, allowing scripted maintenance > commands that suit individual needs better than the canned > invocation of "run-job", but I didn't get the impression that the > series strives to make sure that all knobs of these individual > steps are available to scripters who want to deviate from what > "run-job" prescribes. If it is not doing so, we probably should. > > - Again, I do not think we want a reimplementation of cron, at or > inetd that is not specific to "git" at all. I expected the job-runner to get some push-back. The design for it in the current RFC matched how we do it in Scalar more than anything else. You're probably right that it would be better to leave the "background" part to the platform. Of course, not every platform has "cron" but that just means we need a cross-platform way to launch Git processes on some schedule. That could be a command that creates a cron job on platforms that have it, and on Windows it could create a scheduled task instead. But what should we launch? It should probably be a Git command that checks config for a list of repositories, then runs "the maintenance command" on each of those repos. I'm inserting a break here to draw the eye to a new proposed design: --- Create a "git maintenance" builtin. This has a few subcommands: 1. "run" will run the configured maintenance on the current repo. This should become the single entry point for users to say "please clean up my repo." What _exactly_ it does can be altered with config. I'll list some possibilities after listing the subcommands. 2. "run-on-repos" uses command-line arguments or config to launch "git -C <dir> maintenance run" for all configured directories. The intention is that this is launched on some schedule by a platform- specific scheduling mechanism (i.e. cron). (This subcommand could use a better name.) 3. "schedule" adds the current repository to the configured list of repositories for running with "run-on-repos". It will also initialize the platform-specific scheduling mechanism. This may be to start the schedule for the first time OR to update how frequent "run-on-repos" is run, as appropriate. 4. (OPTIONAL) "mode <mode>" adjusts the config for the current repo to change the type of maintenance requested for this repo. For example, "simple" could just run "git gc --auto" using a normal range. "incremental" could run the maintenance tasks from this series. Finally, "server" could run maintenance tasks as if we are serving the repo to others, so we repack aggressively with full bitmaps, and more frequently. Here are some possible maintenance tasks. Not all of them would be appropriate to run on the same repo, or at least not with the same frequency: * "fetch" : the background fetch from PATCH 3. Appropriate for all modes, but perhaps would want users to opt-in to this in the basic mode. * "commit-graph" : the incremental commit-graph writes from PATCH 2. Appropriate whenever the "fetch" command is being run, but also valuable for the "server" mode. * "gc" : Run "git gc --auto". This would be enabled by default, but should be disabled for the "incremental" and "server" modes. * "repack" : Run "git repack <options>" with appropriate options based on config. The "server" mode would include custom delta and bitmap options. (I will leave the specifics to those who maintain servers to recommend the best options for "server" mode.) * "loose-objects" : see PATCH 4. Appropriate for "incremental" mode. * "multi-pack-index" or "incremental-repack" : Run the "pack-files" job from PATCH 5. Appropriate for "incremental" mode. * "pack-refs" : create a packed-refs file or repack the reftable as appropriate for those features. (I have less familiarity with these.) Notice that with this new set of options we could do something rather dramatic: replace all calls to "git gc --auto" with "git maintenance run --auto". By default, these would be equivalent. However, "git maintenance run --auto" is more clear that the behavior is less specific than "git gc" and could be configured to do something different. I used an "--auto" option in the suggestion above to help distinguish between the command being run as a foreground operation instead of a background operation. Part of setting up a schedule would include disabling these "foreground" maintenance tasks and relying entirely on the background tasks instead. The best situation would be to avoid launching the subprocess at all. --- What do people think of this alternative? Does this get us closer to an appropriate level of work for Git to do? Thanks, -Stolee