Re: [PATCH v2 5/6] submodule: improve submodule_has_commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 1, 2017 at 6:02 PM, Brandon Williams <bmwill@xxxxxxxxxx> wrote:
> Teach 'submodule_has_commits()' to ensure that if a commit exists in a
> submodule, that it is also reachable from a ref.
>
> This is a preparatory step prior to merging the logic which checks for
> changed submodules when fetching or pushing.
>
> Change-Id: I4fed2acfa7e69a5fbbca534df165671e77a90f22
> Signed-off-by: Brandon Williams <bmwill@xxxxxxxxxx>
> ---
>  submodule.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/submodule.c b/submodule.c
> index 3bcf44521..057695e64 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -644,10 +644,44 @@ static int submodule_has_commits(const char *path, struct oid_array *commits)
>  {
>         int has_commit = 1;
>
> +       /*
> +        * Perform a cheap, but incorrect check for the existance of 'commits'.
> +        * This is done by adding the submodule's object store to the in-core
> +        * object store, and then querying for each commit's existance.  If we
> +        * do not have the commit object anywhere, there is no chance we have
> +        * it in the object store of the correct submodule and have it
> +        * reachable from a ref, so we can fail early without spawning rev-list
> +        * which is expensive.
> +        */
>         if (add_submodule_odb(path))
>                 return 0;

Thanks for the comment!

>
>         oid_array_for_each_unique(commits, check_has_commit, &has_commit);
> +
> +       if (has_commit) {
> +               /*
> +                * Even if the submodule is checked out and the commit is
> +                * present, make sure it exists in the submodule's object store
> +                * and that it is reachable from a ref.
> +                */
> +               struct child_process cp = CHILD_PROCESS_INIT;
> +               struct strbuf out = STRBUF_INIT;
> +
> +               argv_array_pushl(&cp.args, "rev-list", "-n", "1", NULL);
> +               oid_array_for_each_unique(commits, append_oid_to_argv, &cp.args);
> +               argv_array_pushl(&cp.args, "--not", "--all", NULL);
> +
> +               prepare_submodule_repo_env(&cp.env_array);
> +               cp.git_cmd = 1;
> +               cp.no_stdin = 1;
> +               cp.dir = path;
> +
> +               if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) || out.len)

eh, I gave too much and self-contradicting feedback here earlier,
ideally I'd like to review this to be similar as:

    if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1)
        die("cannot capture git-rev-list in submodule '%s', sub->path);

    if (out.len)
        has_commit = 0;

instead as that does not have a silent error. (though it errs
on the safe side, so maybe it is not to bad.)

I could understand if the callers do not want to have
`submodule_has_commits` die()-ing on them, so maybe

    if (capture_command(&cp, &out, GIT_MAX_HEXSZ + 1) {
        warning("cannot capture git-rev-list in submodule '%s', sub->path);
        has_commit = -1;
        /* this would require auditing all callers and handling -1 though */
    }

    if (out.len)
        has_commit = 0;

As the comment eludes, we'd then have
 0 -> has no commits
 1 -> has commits
-1 -> error

So to group (error || has_no_commits), we could write

    if (submodule_has_commits(..) <= 0)

which is awkward. So maybe we can rename the function
to misses_submodule_commits instead, as then we could
flip the return value as well and have

 0 -> has commits
 1 -> has no commits
-1 -> error

and the lazy invoker could just go with

    if (!misses_submodule_commits(..))
        proceed();
    else
        die("missing submodule commits or errors; I don't care");

whereas the careful invoker could go with

    switch (misses_submodule_commits(..)) {
    case 0:
        proceed(); break;
    case 1:
        pull_magic_trick(); break;
    case -1:
        make_errors_go_away_and_retry(); break;
    }



---
On the longer term plan:
As you wrote about costs. Maybe instead of invoking rev-list,
we could try to have this in-core as a first try-out for
"classified-repos", looking at refs.h there is e.g.

    int for_each_ref_submodule(const char *submodule_path,
          each_ref_fn fn, void *cb_data);

which we could use to obtain all submodule refs and then
use the revision walking machinery to find out ourselves if
we have or do not have the commits. (As we loaded the
odb of the submodule, this would *just work*, building one
kludgy hack upon the next.)

Thanks,
Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]