On 7/23/2020 4:53 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt >> index 9204762e21..0927643247 100644 >> --- a/Documentation/git-maintenance.txt >> +++ b/Documentation/git-maintenance.txt >> @@ -53,6 +53,18 @@ since it will not expire `.graph` files that were in the previous >> `commit-graph-chain` file. They will be deleted by a later run based on >> the expiration delay. >> >> +prefetch:: >> + The `fetch` task updates the object directory with the latest objects > > s/fetch/prefetch/ most likely. > >> + from all registered remotes. For each remote, a `git fetch` command >> + is run. The refmap is custom to avoid updating local or remote > > s/remote/remote-tracking/ definitely. Do not forget the hyphen > between the two words. > > I think it made the above unnecessarily confusing that you ended a > sentence after "is run". It gives a wrong impression that you'd be > doing a "real fetch", which you need to dispel with a follow up > description of the refmap. > > For each remote, a `git fetch` command is run with a refspec > to fetch their branches (those in their `refs/heads`) into > our `refs/prefetch/<remote>/` hierarchy and without auto > following tags (the configured refspec in the repository is > ignored). > >> + branches (those in `refs/heads` or `refs/remotes`). Instead, the >> + remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are >> + not updated. >> ++ >> +This means that foreground fetches are still required to update the >> +remote refs, but the users is notified when the branches and tags are > > s/is notified/are notified/??? > >> +updated on the remote. > > Often, when one needs to say "X. This means Y.", X is a suboptimal > way to explain what needs to be conveyed to the readers. But this > is not such a case. Unlike the "This means" that is often an > attempt to rephrasing a poor explanation given first, this gives an > implication. > > But let's not start with a negative impression (i.e. even with > prefetch, I still have to do X? What's the point???), but let them > feel why it is a good thing. Perhaps (continuing my previous > rewrite): > > This is done to avoid disrupting the remote-tracking > branches---the end users expect them to stay unmoved unless > they initiate a fetch. With prefetch task, however, the > objects necessary to complete a later real fetch would > already be obtained, so the real fetch would go faster. In > the ideal case, it will just become an update to bunch of > remote-tracking branches without any object transfer. > > or something like that? I like this clarification and have adapted it with minimal edits. >> + argv_array_pushl(&cmd, "fetch", remote, "--prune", >> + "--no-tags", "--refmap=", NULL); >> + strbuf_addf(&refmap, "+refs/heads/*:refs/prefetch/%s/*", remote); >> + argv_array_push(&cmd, refmap.buf); > > The command line looks somewhat fishy, but I think it is correct. > At first glance it looks like a mistake to pass "--refmap=" and the > refspec "+refs/heads/*:refs/prefetch/origin/*" as separate arguments, > but I think that is exactly what you want here, i.e. > > - defeat any refspec in the configuration with "--refmap=<empty>" > > - give explicit refspec "+refs/heads/*:...", with "--no-tags" to > decline auto-following, to tell what exactly are to be fetched > and stored where. > > The description in the log message about refmap needs to be > clarified, though (I've already done so in the above suggested > rewrite). I could have made your life easier by referring to b40a50264ac (fetch: document and test --refmap="", 2020-01-21) in my commit message. It includes this sentence in Documentation/fetch-options.txt: Providing an empty `<refspec>` to the `--refmap` option causes Git to ignore the configured refspecs and rely entirely on the refspecs supplied as command-line arguments. >> +static int maintenance_task_prefetch(void) >> +{ >> + int result = 0; >> + struct string_list_item *item; >> + struct string_list remotes = STRING_LIST_INIT_DUP; >> + >> + if (for_each_remote(fill_each_remote, &remotes)) { >> + error(_("failed to fill remotes")); >> + result = 1; >> + goto cleanup; >> + } >> + >> + /* >> + * Do not modify the result based on the success of the 'fetch' >> + * operation, as a loss of network could cause 'fetch' to fail >> + * quickly. We do not want that to stop the rest of our >> + * background operations. >> + */ > > The loop that runs different tasks abort at the first failure, > though. Perhaps that loop needs to be rethought as well? You're right. These maintenance tasks are intended to be independent of each other, so let's try all of them and report a failure after all have been given an opportunity to run. That makes this failure behavior unnecessary. >> + for (item = remotes.items; >> + item && item < remotes.items + remotes.nr; >> + item++) >> + fetch_remote(item->string); >> + >> +cleanup: >> + string_list_clear(&remotes, 0); >> + return result; >> +} >> + >> static int maintenance_task_gc(void) >> { >> int result; >> @@ -871,6 +929,10 @@ static void initialize_tasks(void) >> for (i = 0; i < MAX_NUM_TASKS; i++) >> tasks[i] = xcalloc(1, sizeof(struct maintenance_task)); >> >> + tasks[num_tasks]->name = "prefetch"; >> + tasks[num_tasks]->fn = maintenance_task_prefetch; >> + num_tasks++; >> + >> tasks[num_tasks]->name = "gc"; >> tasks[num_tasks]->fn = maintenance_task_gc; >> tasks[num_tasks]->enabled = 1; > > Two things. > > - As I said upfront, I do not see the point of preparing the table > with code. > > - The reason why prefetch is placed in front is probably because > you do not want to repack before you add more objects to the > object store. But doesn't that imply that there is an inherent > ordering that we, as those who are more expert on Git than the > end users, prefer? Is it a wise decision to let the users affect > the order of the tasks run by giving command line options in > different order in the previous step? I don't anticipate users specifying --task=<task> very often, as it requires deep knowledge of the tasks. If a user _does_ use the option, then we should trust their order as they might have a good reason to choose that order. Generally, my philosophy is to provide expert users with flexible choices while creating sensible defaults for non-expert users. That said, if this sort is more of a problem than the value it provides, then I can drop the sort and the duplicate --task=<task> check. Thanks, -Stolee