Re: [GSoC] [PATCH v2] submodule--helper: run update procedures from C

Shourya Shukla <periperidip@xxxxxxxxx> · Tue, 3 Aug 2021 00:20:39 +0530

Le lun. 2 août 2021 à 18:36, Atharva Raykar <raykar.ath@xxxxxxxxx> a écrit :
>
> Add a new submodule--helper subcommand `run-update-procedure` that runs
> the update procedure if the SHA1 of the submodule does not match what
> the superproject expects.
>
> This is an intermediate change that works towards total conversion of
> `submodule update` from shell to C.
>
> Specific error codes are returned so that the shell script calling the
> subcommand can take a decision on the control flow, and preserve the
> error messages across subsequent recursive calls of `cmd_update`.
>
> This patch could have been approached differently, by first changing the
> `is_tip_reachable` and `fetch_in_submodule` shell functions to be
> `submodule--helper` subcommands, and then following up with a patch that
> introduces the `run-update-procedure` subcommand. We have not done it
> like that because those functions are trivial enough to convert directly
> along with these other changes. This lets us avoid the boilerplate and
> the cleanup patches that will need to be introduced in following that
> approach.

I feel that this part is more suitable for a cover letter rather than the commit
message itself. It is a useful piece of info though.

> This change is more focused on doing a faithful conversion, so for now we
> are not too concerned with trying to reduce subprocess spawns.
>
> Signed-off-by: Atharva Raykar <raykar.ath@xxxxxxxxx>
> Mentored-by: Christian Couder <christian.couder@xxxxxxxxx>
> Mentored-by: Shourya Shukla <periperidip@xxxxxxxxx>
> ---
>
> Notable changes since v1:
>
> * Modified the code structure in
>   submodule--helper.c:run_update_command(), while fixing problems with
>   the translation marks.
>
> * Renamed '--sha1' and '--subsha1' options to '--oid' and '--suboid' to
>   since the argument is parsed into an object_id struct, not plain sha1
>   data.
>
> * Used option callbacks to parse the SHA1 arguments directly.
>
> * Moved the conditional out of 'do_run_update_procedure()'.
>
> Feedback required:
>
> Ævar felt that it would be clearer to populate the 'fatal' messages
> after the run_command() operation in 'run_update_command()', to make it
> more readable [1]. I have attempted something like that here, and it has led
> to a lot more duplicated 'switch' statements, which feels suboptimal.
> I'd appreciate suggestions to make it more legible.
>
> [1] https://lore.kernel.org/git/87r1fps63r.fsf@xxxxxxxxxxxxxxxxxxx/
>
> Fetch-it-Via:
> git fetch https://github.com/tfidfwastaken/git submodule-run-update-proc-list-2
>
>  builtin/submodule--helper.c | 253 ++++++++++++++++++++++++++++++++++++
>  git-submodule.sh            | 106 +++++----------
>  2 files changed, 286 insertions(+), 73 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index d55f6262e9..b9c40324d0 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -2029,6 +2029,20 @@ struct submodule_update_clone {
>         .max_jobs = 1, \
>  }
>
> +struct update_data {
> +       const char *recursive_prefix;
> +       const char *sm_path;
> +       const char *displaypath;
> +       struct object_id oid;
> +       struct object_id suboid;
> +       struct submodule_update_strategy update_strategy;
> +       int depth;
> +       unsigned int force: 1;
> +       unsigned int quiet: 1;
> +       unsigned int nofetch: 1;
> +       unsigned int just_cloned: 1;
> +};
> +#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
>
>  static void next_submodule_warn_missing(struct submodule_update_clone *suc,
>                 struct strbuf *out, const char *displaypath)
> @@ -2282,6 +2296,175 @@ static int git_update_clone_config(const char *var, const char *value,
>         return 0;
>  }
> +
> +static int run_update_command(struct update_data *ud, int subforce)
> +{
> +       struct child_process cp = CHILD_PROCESS_INIT;
> +       char *oid = oid_to_hex(&ud->oid);
> +       int must_die_on_failure = 0;
> +
> +       cp.dir = xstrdup(ud->sm_path);
> +       switch (ud->update_strategy.type) {
> +       case SM_UPDATE_CHECKOUT:
> +               cp.git_cmd = 1;
> +               strvec_pushl(&cp.args, "checkout", "-q", NULL);

Would it be possible to add the 'if' statement above just before the
'switch' (or after,
whichever seems okay) since this is common amongst (almost) all the cases?

> +               if (subforce)
> +                       strvec_push(&cp.args, "-f");
> +               break;
> +       case SM_UPDATE_REBASE:
> +               cp.git_cmd = 1;
> +               strvec_push(&cp.args, "rebase");
> +               if (ud->quiet)
> +                       strvec_push(&cp.args, "--quiet");
> +               must_die_on_failure = 1;
> +               break;
> +       case SM_UPDATE_MERGE:
> +               cp.git_cmd = 1;
> +               strvec_push(&cp.args, "merge");
> +               if (ud->quiet)
> +                       strvec_push(&cp.args, "--quiet");
> +               must_die_on_failure = 1;
> +               break;
> +       case SM_UPDATE_COMMAND:
> +               /* NOTE: this does not handle quoted arguments */
> +               strvec_split(&cp.args, ud->update_strategy.command);
> +               must_die_on_failure = 1;
> +               break;
> +       case SM_UPDATE_UNSPECIFIED:
> +       case SM_UPDATE_NONE:
> +               BUG("update strategy should have been specified");
> +       }

If the original did not bug out, do we need to? The documentation does
not mention
this as well:
https://git-scm.com/docs/git-submodule#Documentation/git-submodule.txt-none

> +
> +       strvec_push(&cp.args, oid);
> +
> +       prepare_submodule_repo_env(&cp.env_array);
> +
> +       if (run_command(&cp)) {
> +               if (must_die_on_failure) {
> +                       switch (ud->update_strategy.type) {
> +                       case SM_UPDATE_CHECKOUT:
> +                               die(_("Unable to checkout '%s' in submodule path '%s'"),
> +                                     oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_REBASE:
> +                               die(_("Unable to rebase '%s' in submodule path '%s'"),
> +                                     oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_MERGE:
> +                               die(_("Unable to merge '%s' in submodule path '%s'"),
> +                                     oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_COMMAND:
> +                               die(_("Execution of '%s %s' failed in submodule path '%s'"),
> +                                     ud->update_strategy.command, oid, ud->displaypath);
> +                               break;
> +                       case SM_UPDATE_UNSPECIFIED:
> +                       case SM_UPDATE_NONE:
> +                               BUG("update strategy should have been specified");
> +                       }
> +               }
> +               /*
> +                * This signifies to the caller in shell that
> +                * the command failed without dying
> +                */
> +               return 1;
> +       }
> +
> +       switch (ud->update_strategy.type) {
> +       case SM_UPDATE_CHECKOUT:
> +               printf(_("Submodule path '%s': checked out '%s'\n"),
> +                      ud->displaypath, oid);
> +               break;
> +       case SM_UPDATE_REBASE:
> +               printf(_("Submodule path '%s': rebased into '%s'\n"),
> +                      ud->displaypath, oid);
> +               break;
> +       case SM_UPDATE_MERGE:
> +               printf(_("Submodule path '%s': merged in '%s'\n"),
> +                      ud->displaypath, oid);
> +               break;
> +       case SM_UPDATE_COMMAND:
> +               printf(_("Submodule path '%s': '%s %s'\n"),
> +                      ud->displaypath, ud->update_strategy.command, oid);
> +               break;
> +       case SM_UPDATE_UNSPECIFIED:
> +       case SM_UPDATE_NONE:
> +               BUG("update strategy should have been specified");
> +       }
> +
> +       return 0;
> +}
> +
> +static int do_run_update_procedure(struct update_data *ud)
> +{
> +       int subforce = is_null_oid(&ud->suboid) || ud->force;
> +
> +       if (!ud->nofetch) {
> +               /*
> +                * Run fetch only if `oid` isn't present or it
> +                * is not reachable from a ref.
> +                */
> +               if (!is_tip_reachable(ud->sm_path, &ud->oid))
> +                       if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
> +                           !ud->quiet)
> +                               fprintf_ln(stderr,
> +                                          _("Unable to fetch in submodule path '%s'; "
> +                                            "trying to directly fetch %s:"),
> +                                          ud->displaypath, oid_to_hex(&ud->oid));

I was wondering if an OID is invalid, will it be counted as
unreachable and vice-versa?
If that is the case then that would simplify the work.