On 2020-07-23 17:56:30+0000, Derrick Stolee via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > When working with very large repositories, an incremental 'git fetch' > command can download a large amount of data. If there are many other > users pushing to a common repo, then this data can rival the initial > pack-file size of a 'git clone' of a medium-size repo. > > Users may want to keep the data on their local repos as close as > possible to the data on the remote repos by fetching periodically in > the background. This can break up a large daily fetch into several > smaller hourly fetches. > > The task is called "prefetch" because it is work done in advance > of a foreground fetch to make that 'git fetch' command much faster. > > However, if we simply ran 'git fetch <remote>' in the background, > then the user running a foregroudn 'git fetch <remote>' would lose > some important feedback when a new branch appears or an existing > branch updates. This is especially true if a remote branch is > force-updated and this isn't noticed by the user because it occurred > in the background. Further, the functionality of 'git push > --force-with-lease' becomes suspect. > > When running 'git fetch <remote> <options>' in the background, use > the following options for careful updating: Does this job interfere with FETCH_HEAD? >From my quick test (by applying 01-08 on top of rc1, and messing with t7900), it looks like yes. I (and some other people, probably) rely on FETCH_HEAD for our scripts. Hence, it would be nice to not touch FETCH_HEAD with prefetch job. Thanks, -Danh > > 1. --no-tags prevents getting a new tag when a user wants to see > the new tags appear in their foreground fetches. > > 2. --refmap= removes the configured refspec which usually updates > refs/remotes/<remote>/* with the refs advertised by the remote. > > 3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*" > we can ensure that we actually load the new values somewhere in > our refspace while not updating refs/heads or refs/remotes. By > storing these refs here, the commit-graph job will update the > commit-graph with the commits from these hidden refs. > > 4. --prune will delete the refs/prefetch/<remote> refs that no > longer appear on the remote. > > We've been using this step as a critical background job in Scalar > [1] (and VFS for Git). This solved a pain point that was showing up > in user reports: fetching was a pain! Users do not like waiting to > download the data that was created while they were away from their > machines. After implementing background fetch, the foreground fetch > commands sped up significantly because they mostly just update refs > and download a small amount of new data. The effect is especially > dramatic when paried with --no-show-forced-udpates (through > fetch.showForcedUpdates=false). > > [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > Documentation/git-maintenance.txt | 12 ++++++ > builtin/gc.c | 64 ++++++++++++++++++++++++++++++- > t/t7900-maintenance.sh | 24 ++++++++++++ > 3 files changed, 99 insertions(+), 1 deletion(-) > > diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > index 9204762e21..0927643247 100644 > --- a/Documentation/git-maintenance.txt > +++ b/Documentation/git-maintenance.txt > @@ -53,6 +53,18 @@ since it will not expire `.graph` files that were in the previous > `commit-graph-chain` file. They will be deleted by a later run based on > the expiration delay. > > +prefetch:: > + The `fetch` task updates the object directory with the latest objects > + from all registered remotes. For each remote, a `git fetch` command > + is run. The refmap is custom to avoid updating local or remote > + branches (those in `refs/heads` or `refs/remotes`). Instead, the > + remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are > + not updated. > ++ > +This means that foreground fetches are still required to update the > +remote refs, but the users is notified when the branches and tags are > +updated on the remote. > + > gc:: > Cleanup unnecessary files and optimize the local repository. "GC" > stands for "garbage collection," but this task performs many > diff --git a/builtin/gc.c b/builtin/gc.c > index 5d99b4b805..969c127877 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -28,6 +28,7 @@ > #include "blob.h" > #include "tree.h" > #include "promisor-remote.h" > +#include "remote.h" > > #define FAILED_RUN "failed to run %s" > > @@ -700,7 +701,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > return 0; > } > > -#define MAX_NUM_TASKS 2 > +#define MAX_NUM_TASKS 3 > > static const char * const builtin_maintenance_usage[] = { > N_("git maintenance run [<options>]"), > @@ -781,6 +782,63 @@ static int maintenance_task_commit_graph(void) > return 1; > } > > +static int fetch_remote(const char *remote) > +{ > + int result; > + struct argv_array cmd = ARGV_ARRAY_INIT; > + struct strbuf refmap = STRBUF_INIT; > + > + argv_array_pushl(&cmd, "fetch", remote, "--prune", > + "--no-tags", "--refmap=", NULL); > + > + strbuf_addf(&refmap, "+refs/heads/*:refs/prefetch/%s/*", remote); > + argv_array_push(&cmd, refmap.buf); > + > + if (opts.quiet) > + argv_array_push(&cmd, "--quiet"); > + > + result = run_command_v_opt(cmd.argv, RUN_GIT_CMD); > + > + strbuf_release(&refmap); > + return result; > +} > + > +static int fill_each_remote(struct remote *remote, void *cbdata) > +{ > + struct string_list *remotes = (struct string_list *)cbdata; > + > + string_list_append(remotes, remote->name); > + return 0; > +} > + > +static int maintenance_task_prefetch(void) > +{ > + int result = 0; > + struct string_list_item *item; > + struct string_list remotes = STRING_LIST_INIT_DUP; > + > + if (for_each_remote(fill_each_remote, &remotes)) { > + error(_("failed to fill remotes")); > + result = 1; > + goto cleanup; > + } > + > + /* > + * Do not modify the result based on the success of the 'fetch' > + * operation, as a loss of network could cause 'fetch' to fail > + * quickly. We do not want that to stop the rest of our > + * background operations. > + */ > + for (item = remotes.items; > + item && item < remotes.items + remotes.nr; > + item++) > + fetch_remote(item->string); > + > +cleanup: > + string_list_clear(&remotes, 0); > + return result; > +} > + > static int maintenance_task_gc(void) > { > int result; > @@ -871,6 +929,10 @@ static void initialize_tasks(void) > for (i = 0; i < MAX_NUM_TASKS; i++) > tasks[i] = xcalloc(1, sizeof(struct maintenance_task)); > > + tasks[num_tasks]->name = "prefetch"; > + tasks[num_tasks]->fn = maintenance_task_prefetch; > + num_tasks++; > + > tasks[num_tasks]->name = "gc"; > tasks[num_tasks]->fn = maintenance_task_gc; > tasks[num_tasks]->enabled = 1; > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > index c09a9eb90b..8b04a04c79 100755 > --- a/t/t7900-maintenance.sh > +++ b/t/t7900-maintenance.sh > @@ -44,4 +44,28 @@ test_expect_success 'run --task duplicate' ' > test_i18ngrep "cannot be selected multiple times" err > ' > > +test_expect_success 'run --task=prefetch with no remotes' ' > + git maintenance run --task=prefetch 2>err && > + test_must_be_empty err > +' > + > +test_expect_success 'prefetch multiple remotes' ' > + git clone . clone1 && > + git clone . clone2 && > + git remote add remote1 "file://$(pwd)/clone1" && > + git remote add remote2 "file://$(pwd)/clone2" && > + git -C clone1 switch -c one && > + git -C clone2 switch -c two && > + test_commit -C clone1 one && > + test_commit -C clone2 two && > + GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch && > + grep ",\"fetch\",\"remote1\"" run-prefetch.txt && > + grep ",\"fetch\",\"remote2\"" run-prefetch.txt && > + test_path_is_missing .git/refs/remotes && > + test_cmp clone1/.git/refs/heads/one .git/refs/prefetch/remote1/one && > + test_cmp clone2/.git/refs/heads/two .git/refs/prefetch/remote2/two && > + git log prefetch/remote1/one && > + git log prefetch/remote2/two > +' > + > test_done > -- > gitgitgadget > -- Danh