On Oct 5, 2010, at 1:43 PM, Jens Lehmann wrote: > Kevin reported the fetch time went up from 1.5s to 20s for him > because of the recursion. Kevin, could you please test the branch > "parallel-submodule-fetch" from my github repository: > > http://github.com/jlehmann/git-submod-enhancements.git > > It has these three patches based on next plus a preliminary > commit fetching submodules in parallel (but note that a limit > of 128 submodules is hardcoded and the output might be mixed > between the fetch threads, I'll fix that when you confirm the > performance benefit I expect). The first `git fetch` still took 20 seconds, but that's because there was data to fetch from one of the deeply-nested submodules (data which, incidentally, I have zero reason to want to fetch). Subsequent fetches took 6.3 seconds. This is contrasted with 1.9s to run `git -c fetch.recursive=false fetch`. On Oct 5, 2010, at 2:06 PM, Junio C Hamano wrote: >> a) "git fetch --all" >> >> The user wanted to fetch new commits from all remotes. I think >> git should also fetch all submodules then, no matter if new >> commits from them are fetched in the superproject, as the user >> explicitly said he wants everything. Objections? > > Why? I do not see a "--submodules" option on that command line. The only > thing I asked is to grab all branches for the project I ran "git fetch" > in. I agree with Junio. >> b) "git fetch [<repository>]" >> >> The user wants to fetch from the default [or a single repo]. I >> think all submodules should be fetched too, Kevin thinks this >> should happen only when it is necessary (at least for his 'H' >> repository). While I think fetching all submodules too is >> consistent with the fact that you get all branches in the >> superproject too, whether you need them or not, we can't seem >> to agree on this one (also see my proposal below). > > The case with <repository> is a lot more questionable than the case of > fetching implicitly from whereever you usually fetch from. Imagine that > you fork git.git, and create a separate project that has some nifty > additions to support submodules better. The additional part is naturally > done as a submodule. This jens.git repository becomes very popular and > people clone from it. Your users usually interact with your repository by > saying "git fetch" or "git pull" without any explicit <repository>. They, > however, would want to fetch/pull from me from time to time to get updates > that you haven't incorporated in jens.git repository. "git fetch junio" is > run. Why should such a "fetch" go to your repository and slurp the > objects for the submodules? > > Perhaps you would want some knobs like these? > > [remote "origin"] > fetch-submodules = all > fetch-submodules = changed > > [remote "junio"] > fetch-submodules = none > > I dunno. I've never been a fan of automatically recursing into submodules > (iow, treating the nested structure as if there is no nesting), so... I agree with this as well. After thinking on it a bit, I think the best solution is to add a switch --submodules to fetch which will also fetch all submodules, but otherwise fetch will fetch no submodules. This will avoid the problem of detecting changed submodules, while still allowing users to explicitly request all submodules in case they're about to get on a plane flight. And of course we can use a config switch to turn --submodules on or off by default. We should also give some thought to automatically updating submodules when `git pull` is performed. I could imagine `git pull --submodules` effectively doing `git pull && git submodule update --init --recursive`, though this implies submodule updating behavior as part of merge, and it seems harder to justify that. -Kevin Ballard -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html