Hi Junio, On Thu, 18 Jul 2019, Junio C Hamano wrote: > Palmer Dabbelt <palmer@xxxxxxxxxx> writes: > > > * As I was writing the documentation I found "fetch --jobs". It seems > > like I should use that instead of adding a new argrument, but I wasn't > > sure. > > Why not? What makes you feel it is a bad idea to follow that > pattern? > > Ah, --jobs that is taken already is right now too tied to fetches > that happen in submodules, which arguably was a design mistake. > > -j:: > --jobs=<n>:: > Number of parallel children to be used for fetching > submodules. Each will fetch from different submodules, > such that fetching many submodules will be faster. By > default submodules will be fetched one at a time. > > The simplest endgame would be to replace "submodule" with > "repository" in the above description, perhaps like > > Number of parallel jobs to be used for fetching from > multiple repositories (both fetching with "--multiple" from > multiple repositories, and also fetching updated contents > for submodules). By default, fetching from multiple > repositories and submodules is done one at a time. > > and nobody would have complained if the system were like so from the > beginning. Existing users, however, may want extra flexibility, and > would complain loudly if we did the above, in which case, we may > have to > > - introduce --fetch-jobs=<n> for what you are adding; > > - introduce --submodule-fetch-jobs=<n> as a synonym for existing > --jobs=<n> and deprecate the current use of --jobs=<n>; > > - eventually repurpose --jobs=<n> as a short-hand to give both > --fetch-jobs and --submoduje-fetch-jobs at the same time. Given that the relevant code looks like this: if (remote) { if (filter_options.choice || repository_format_partial_clone) fetch_one_setup_partial(remote); result = fetch_one(remote, argc, argv, prune_tags_ok); } else { if (filter_options.choice) die(_("--filter can only be used with the remote " "configured in extensions.partialclone")); /* TODO should this also die if we have a previous partial-clone? */ result = fetch_multiple(&list); } if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) { struct argv_array options = ARGV_ARRAY_INIT; add_options_to_argv(&options); result = fetch_populated_submodules(the_repository, &options, submodule_prefix, recurse_submodules, recurse_submodules_default, verbosity < 0, max_children); argv_array_clear(&options); } i.e. the `fetch_multiple()` call is _strictly_ before the call to `fetch_populated_submodules()`, I would contend that this level of separation does not serve anybody but a fan of the complicators' gloves. You would also find yourself in quite a pickle if you wanted to explain to a user why those `--fetch-jobs` and `--submodule-fetch-jobs` are separate options _and_ why they repeat the command name in the option name. And if it is hard to explain, there is usually a better design choice in the first place. In other words, I would be much more in favor of `--jobs` _also_ extending to the `fetch_multiple()` call, not just the `fetch_populated_submodules()` call. > > @@ -1456,12 +1459,15 @@ static void add_options_to_argv(struct argv_array *argv) > > > > } > > > > -static int fetch_multiple(struct string_list *list) > > +static int fetch_multiple(struct string_list *list, int i) > > { > > - int i, result = 0; > > 'i' is perfectly a good name for a local variable that is used for > loop control purposes, but makes a horrible name for a parameter. > > Existing 'list' is not any better either---we know it is a list by > its type already, the name should say what the list is about, what > it represents. But having a horribly named parameter already is not > a good reason to make the code even worse. > > And as you said, recursion makes the code structure harder to follow > here. Keeping an array of --jobs=<n> cmd structures, looping to > fill them by starting, doing wait() to reap any of the started ones > that first exits to refill the slot just opened, etc. would be easier > to see if done in a loop, I think. I have to admit that I'd _much_ rather see the strategy of `fetch_populated_submodules()` emulated, where it uses the `run_processes_parallel_tr2()` function to perform the actual fetches. That buys us a lot of advantages: - the recursion problems mentioned in the original mail will _not_ be any issue, - the progress won't be garbled, - we can automatically restrict the maximal number of parallel fetches. It should be super easy to get started with this, simply by moving the `list` and the `i` variables into a `struct` that is then passed to `run_processes_parallel_tr2()`, then implementing the three callbacks. Those callbacks can take inspiration from `submodule.c`'s `get_next_submodule()`, `fetch_start_failure()`, and `fetch_finish()` functions, but rest assured that the `fetch_multiple()` ones will look _a lot_ simpler. Ciao, Dscho P.S.: It would probably also make sense to extend this contribution by a second patch that teaches `git remote update` a `--jobs` option.