On Mon, May 1, 2017 at 6:02 PM, Brandon Williams <bmwill@xxxxxxxxxx> wrote: > Teach 'submodule_has_commits()' to ensure that if a commit exists in a > submodule, that it is also reachable from a ref. > > This is a preparatory step prior to merging the logic which checks for > changed submodules when fetching or pushing. > > Change-Id: I4fed2acfa7e69a5fbbca534df165671e77a90f22 > Signed-off-by: Brandon Williams <bmwill@xxxxxxxxxx> > --- > submodule.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/submodule.c b/submodule.c > index 3bcf44521..057695e64 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -644,10 +644,44 @@ static int submodule_has_commits(const char *path, struct oid_array *commits) > { > int has_commit = 1; > > + /* > + * Perform a cheap, but incorrect check for the existance of 'commits'. > + * This is done by adding the submodule's object store to the in-core > + * object store, and then querying for each commit's existance. If we > + * do not have the commit object anywhere, there is no chance we have > + * it in the object store of the correct submodule and have it > + * reachable from a ref, so we can fail early without spawning rev-list > + * which is expensive. > + */ > if (add_submodule_odb(path)) > return 0; Thanks for the comment! > > oid_array_for_each_unique(commits, check_has_commit, &has_commit); > + > + if (has_commit) { > + /* > + * Even if the submodule is checked out and the commit is > + * present, make sure it exists in the submodule's object store > + * and that it is reachable from a ref. > + */ > + struct child_process cp = CHILD_PROCESS_INIT; > + struct strbuf out = STRBUF_INIT; > + > + argv_array_pushl(&cp.args, "rev-list", "-n", "1", NULL); > + oid_array_for_each_unique(commits, append_oid_to_argv, &cp.args); > + argv_array_pushl(&cp.args, "--not", "--all", NULL); > + > + prepare_submodule_repo_env(&cp.env_array); > + cp.git_cmd = 1; > + cp.no_stdin = 1; > + cp.dir = path; > + > + if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) || out.len) eh, I gave too much and self-contradicting feedback here earlier, ideally I'd like to review this to be similar as: if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) die("cannot capture git-rev-list in submodule '%s', sub->path); if (out.len) has_commit = 0; instead as that does not have a silent error. (though it errs on the safe side, so maybe it is not to bad.) I could understand if the callers do not want to have `submodule_has_commits` die()-ing on them, so maybe if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) { warning("cannot capture git-rev-list in submodule '%s', sub->path); has_commit = -1; /* this would require auditing all callers and handling -1 though */ } if (out.len) has_commit = 0; As the comment eludes, we'd then have 0 -> has no commits 1 -> has commits -1 -> error So to group (error || has_no_commits), we could write if (submodule_has_commits(..) <= 0) which is awkward. So maybe we can rename the function to misses_submodule_commits instead, as then we could flip the return value as well and have 0 -> has commits 1 -> has no commits -1 -> error and the lazy invoker could just go with if (!misses_submodule_commits(..)) proceed(); else die("missing submodule commits or errors; I don't care"); whereas the careful invoker could go with switch (misses_submodule_commits(..)) { case 0: proceed(); break; case 1: pull_magic_trick(); break; case -1: make_errors_go_away_and_retry(); break; } --- On the longer term plan: As you wrote about costs. Maybe instead of invoking rev-list, we could try to have this in-core as a first try-out for "classified-repos", looking at refs.h there is e.g. int for_each_ref_submodule(const char *submodule_path, each_ref_fn fn, void *cb_data); which we could use to obtain all submodule refs and then use the revision walking machinery to find out ourselves if we have or do not have the commits. (As we loaded the odb of the submodule, this would *just work*, building one kludgy hack upon the next.) Thanks, Stefan