"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt > index 9204762e21..0927643247 100644 > --- a/Documentation/git-maintenance.txt > +++ b/Documentation/git-maintenance.txt > @@ -53,6 +53,18 @@ since it will not expire `.graph` files that were in the previous > `commit-graph-chain` file. They will be deleted by a later run based on > the expiration delay. > > +prefetch:: > + The `fetch` task updates the object directory with the latest objects s/fetch/prefetch/ most likely. > + from all registered remotes. For each remote, a `git fetch` command > + is run. The refmap is custom to avoid updating local or remote s/remote/remote-tracking/ definitely. Do not forget the hyphen between the two words. I think it made the above unnecessarily confusing that you ended a sentence after "is run". It gives a wrong impression that you'd be doing a "real fetch", which you need to dispel with a follow up description of the refmap. For each remote, a `git fetch` command is run with a refspec to fetch their branches (those in their `refs/heads`) into our `refs/prefetch/<remote>/` hierarchy and without auto following tags (the configured refspec in the repository is ignored). > + branches (those in `refs/heads` or `refs/remotes`). Instead, the > + remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are > + not updated. > ++ > +This means that foreground fetches are still required to update the > +remote refs, but the users is notified when the branches and tags are s/is notified/are notified/??? > +updated on the remote. Often, when one needs to say "X. This means Y.", X is a suboptimal way to explain what needs to be conveyed to the readers. But this is not such a case. Unlike the "This means" that is often an attempt to rephrasing a poor explanation given first, this gives an implication. But let's not start with a negative impression (i.e. even with prefetch, I still have to do X? What's the point???), but let them feel why it is a good thing. Perhaps (continuing my previous rewrite): This is done to avoid disrupting the remote-tracking branches---the end users expect them to stay unmoved unless they initiate a fetch. With prefetch task, however, the objects necessary to complete a later real fetch would already be obtained, so the real fetch would go faster. In the ideal case, it will just become an update to bunch of remote-tracking branches without any object transfer. or something like that? > + argv_array_pushl(&cmd, "fetch", remote, "--prune", > + "--no-tags", "--refmap=", NULL); > + strbuf_addf(&refmap, "+refs/heads/*:refs/prefetch/%s/*", remote); > + argv_array_push(&cmd, refmap.buf); The command line looks somewhat fishy, but I think it is correct. At first glance it looks like a mistake to pass "--refmap=" and the refspec "+refs/heads/*:refs/prefetch/origin/*" as separate arguments, but I think that is exactly what you want here, i.e. - defeat any refspec in the configuration with "--refmap=<empty>" - give explicit refspec "+refs/heads/*:...", with "--no-tags" to decline auto-following, to tell what exactly are to be fetched and stored where. The description in the log message about refmap needs to be clarified, though (I've already done so in the above suggested rewrite). > +static int maintenance_task_prefetch(void) > +{ > + int result = 0; > + struct string_list_item *item; > + struct string_list remotes = STRING_LIST_INIT_DUP; > + > + if (for_each_remote(fill_each_remote, &remotes)) { > + error(_("failed to fill remotes")); > + result = 1; > + goto cleanup; > + } > + > + /* > + * Do not modify the result based on the success of the 'fetch' > + * operation, as a loss of network could cause 'fetch' to fail > + * quickly. We do not want that to stop the rest of our > + * background operations. > + */ The loop that runs different tasks abort at the first failure, though. Perhaps that loop needs to be rethought as well? > + for (item = remotes.items; > + item && item < remotes.items + remotes.nr; > + item++) > + fetch_remote(item->string); > + > +cleanup: > + string_list_clear(&remotes, 0); > + return result; > +} > + > static int maintenance_task_gc(void) > { > int result; > @@ -871,6 +929,10 @@ static void initialize_tasks(void) > for (i = 0; i < MAX_NUM_TASKS; i++) > tasks[i] = xcalloc(1, sizeof(struct maintenance_task)); > > + tasks[num_tasks]->name = "prefetch"; > + tasks[num_tasks]->fn = maintenance_task_prefetch; > + num_tasks++; > + > tasks[num_tasks]->name = "gc"; > tasks[num_tasks]->fn = maintenance_task_gc; > tasks[num_tasks]->enabled = 1; Two things. - As I said upfront, I do not see the point of preparing the table with code. - The reason why prefetch is placed in front is probably because you do not want to repack before you add more objects to the object store. But doesn't that imply that there is an inherent ordering that we, as those who are more expert on Git than the end users, prefer? Is it a wise decision to let the users affect the order of the tasks run by giving command line options in different order in the previous step?