On Wed, Oct 17, 2018 at 5:40 PM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > > > Currently when git-fetch is asked to recurse into submodules, it dispatches > > a plain "git-fetch -C <submodule-dir>" (with some submodule related options > > such as prefix and recusing strategy, but) without any information of the > > remote or the tip that should be fetched. > > > > This works surprisingly well in some workflows (such as using submodules > > as a third party library), while not so well in other scenarios, such > > as in a Gerrit topic-based workflow, that can tie together changes > > (potentially across repositories) on the server side. One of the parts > > of such a Gerrit workflow is to download a change when wanting to examine > > it, and you'd want to have its submodule changes that are in the same > > topic downloaded as well. However these submodule changes reside in their > > own repository in their own ref (refs/changes/<int>). > > It seems that the pertinent situation is when, in the superproject, you > fetch a commit (which may be the target of a ref, or an ancestor of the > target of a ref) that points to a submodule commit that was not fetched > by the default refspec-less fetch that you describe in the first > paragraph. I would just describe it as follows: > > But this default fetch is not sufficient, as a newly fetched commit in > the superproject could point to a commit in the submodule that is not > in the default refspec. This is common in workflows like Gerrit's. > When fetching a Gerrit change under review (from refs/changes/??), the > commits in that change likely point to submodule commits that have not > been merged to a branch yet. Makes sense. > Another thing you need to clarify is what happens if the fetch-by-commit > fails. Right now, it seems that it will make the whole thing fail, which > might be a surprising change in behavior. But a positive surprise, I would assume? > The test stores the result in a normal branch, not a remote tracking > branch. Is storing in a normal branch required? In the test we fetch from another repository, such that in the repository-under-test this will show up in a remote tracking branch? > Also, do you know why this is required? A naive reading of the patch > leads me to believe that this should work even if merely fetching to > FETCH_HEAD. See the next patch, check_for_new_submodule_commits() is missing for FETCH_HEAD. > > > +struct get_next_submodule_task { > > + struct repository *repo; > > + const struct submodule *sub; > > + unsigned free_sub : 1; /* Do we need to free the submodule? */ > > + struct oid_array *commits; > > +}; > > Firstly, I don't think "commits" needs to be a pointer. > > Document at least "commits". As far as I can tell, if NULL (or empty if > you take my suggestion), this means that this task is the first pass for > this particular submodule and the fetch needs to be done using the > default refspec. Otherwise, this task is the second pass for this > particular submodule and the fetch needs to be done passing the > contained OIDs. Makes sense. I think I'll make it a non-pointer, but will introduce another flag or counter for the phase. > > > +static const struct submodule *get_default_submodule(const char *path) > > +{ > > + struct submodule *ret = NULL; > > + const char *name = default_name_or_path(path); > > + > > + if (!name) > > + return NULL; > > + > > + ret = xmalloc(sizeof(*ret)); > > + memset(ret, 0, sizeof(*ret)); > > + ret->path = name; > > + ret->name = name; > > + > > + return (const struct submodule *) ret; > > +} > > What is a "default" submodule and why would you need one? s/default/artificial/. Such a submodule is a submodule that has no config in the .gitmodules file and its name == path. We need to keep those around for historic reasons AFAICT, c.f. c68f837576 (implement fetching of moved submodules, 2017-10-16) > > + task = get_next_submodule_task_create(spf->r, ce->name); > > + > > + if (!task->sub) { > > + free(task); > > + continue; > > } > > Will task->sub ever be NULL? Yes, if we fall back to these "default" submodule and just try if it can be handled as a submodule, but it cannot be handled as such, get_next_submodule_task_create has task->sub = submodule_from_path(r, &null_oid, path); if (!task->sub) { task->sub = get_default_submodule(path); and get_default_submodule can return NULL. > > > + if (spf->retry_nr) { > > + struct get_next_submodule_task *task = spf->retry[spf->retry_nr - 1]; > > + struct strbuf submodule_prefix = STRBUF_INIT; > > + spf->retry_nr--; > > + > > + strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, task->sub->path); > > + > > + child_process_init(cp); > > + prepare_submodule_repo_env_in_gitdir(&cp->env_array); > > + cp->git_cmd = 1; > > + cp->dir = task->repo->gitdir; > > + > > + argv_array_init(&cp->args); > > + argv_array_pushv(&cp->args, spf->args.argv); > > + argv_array_push(&cp->args, "on-demand"); > > This means that we need to trust that the last entry in spf->args can > take an "on-demand" parameter. Could we supply that argument here > explicitly instead? > > > + argv_array_push(&cp->args, "--submodule-prefix"); > > + argv_array_push(&cp->args, submodule_prefix.buf); > > + > > + /* NEEDSWORK: have get_default_remote from s--h */ > > What is s--h? builtin/submodule--helper, will clarify > > > +static int commit_exists_in_sub(const struct object_id *oid, void *data) > > +{ > > + struct repository *subrepo = data; > > + > > + enum object_type type = oid_object_info(subrepo, oid, NULL); > > + > > + return type != OBJ_COMMIT; > > +} > > Shouldn't the function name be commit_missing_in_sub? yes. > > > static int fetch_finish(int retvalue, struct strbuf *err, > > void *cb, void *task_cb) > > { > > struct submodule_parallel_fetch *spf = cb; > > + struct get_next_submodule_task *task = task_cb; > > + const struct submodule *sub; > > + > > + struct string_list_item *it; > > + struct oid_array *commits; > > > > if (retvalue) > > spf->result = 1; > > > > + if (!task) > > + return 0; > > When will task be NULL? > > > + > > + sub = task->sub; > > + if (!sub) > > + goto out; > > Same as above - when will sub be NULL? > > > + it = string_list_lookup(&spf->changed_submodule_names, sub->name); > > + if (!it) > > + goto out; > > And "it" as well. I'll add comments. > > > + commits = it->util; > > + oid_array_filter(commits, > > + commit_exists_in_sub, > > + task->repo); > > + > > + /* Are there commits that do not exist? */ > > + if (commits->nr) { > > + /* We already tried fetching them, do not try again. */ > > + if (task->commits) > > + return 0; > > Clearer and more efficient if the check for task->commits was first > before all the filtering. > > > +test_expect_success "fetch new commits on-demand when they are not reachable" ' > > "not reachable" confused me - they are indeed reachable, just not from > the default refspec. Makes sense > > > + git checkout --detach && > > + C=$(git -C submodule commit-tree -m "new change outside refs/heads" HEAD^{tree}) && > > + git -C submodule update-ref refs/changes/1 $C && > > + git update-index --cacheinfo 160000 $C submodule && > > + git commit -m "updated submodule outside of refs/heads" && > > + D=$(git rev-parse HEAD) && > > + git update-ref refs/changes/2 $D && > > + ( > > + cd downstream && > > + git fetch --recurse-submodules --recurse-submodules-default on-demand origin refs/changes/2:refs/heads/my_branch && > > + git -C submodule cat-file -t $C && > > + git checkout --recurse-submodules FETCH_HEAD > > + ) > > +' > > For maximum test coverage, I think this test should involve 2 > submodules: > > B C E F > \ / \ / > A D > > and the upstream superproject having: > > G -> H -> I > > The downstream superproject already has G, and is fetching I. In H, the > submodule gitlinks point to B and E respectively, and in I, the > submodule gitlinks point to C and F respectively. This ensures that both > multiple submodules work, and that submodule commits are also fetched > for interior superproject commits. ok.