Re: [GSoC] [PATCH v2] submodule--helper: run update procedures from C

Atharva Raykar <raykar.ath@xxxxxxxxx> · Tue, 03 Aug 2021 15:37:46 +0530

(I am resending this email, because my client mangled the whitespaces
due to a misconfiguration. Please ignore the my previous message.)

Shourya Shukla <periperidip@xxxxxxxxx> writes:

> Le lun. 2 août 2021 à 18:36, Atharva Raykar <raykar.ath@xxxxxxxxx> a écrit :
>>
>> Add a new submodule--helper subcommand `run-update-procedure` that runs
>> the update procedure if the SHA1 of the submodule does not match what
>> the superproject expects.
>>
>> This is an intermediate change that works towards total conversion of
>> `submodule update` from shell to C.
>>
>> Specific error codes are returned so that the shell script calling the
>> subcommand can take a decision on the control flow, and preserve the
>> error messages across subsequent recursive calls of `cmd_update`.
>>
>> This patch could have been approached differently, by first changing the
>> `is_tip_reachable` and `fetch_in_submodule` shell functions to be
>> `submodule--helper` subcommands, and then following up with a patch that
>> introduces the `run-update-procedure` subcommand. We have not done it
>> like that because those functions are trivial enough to convert directly
>> along with these other changes. This lets us avoid the boilerplate and
>> the cleanup patches that will need to be introduced in following that
>> approach.
>
> I feel that this part is more suitable for a cover letter rather than the commit
> message itself. It is a useful piece of info though.

Okay, that seems right, the message does seem a bit too context-sensitive.

>> This change is more focused on doing a faithful conversion, so for now we
>> are not too concerned with trying to reduce subprocess spawns.
>>
>> Signed-off-by: Atharva Raykar <raykar.ath@xxxxxxxxx>
>> Mentored-by: Christian Couder <christian.couder@xxxxxxxxx>
>> Mentored-by: Shourya Shukla <periperidip@xxxxxxxxx>
>> ---
>>
>> Notable changes since v1:
>>
>> * Modified the code structure in
>>   submodule--helper.c:run_update_command(), while fixing problems with
>>   the translation marks.
>>
>> * Renamed '--sha1' and '--subsha1' options to '--oid' and '--suboid' to
>>   since the argument is parsed into an object_id struct, not plain sha1
>>   data.
>>
>> * Used option callbacks to parse the SHA1 arguments directly.
>>
>> * Moved the conditional out of 'do_run_update_procedure()'.
>>
>> Feedback required:
>>
>> Ævar felt that it would be clearer to populate the 'fatal' messages
>> after the run_command() operation in 'run_update_command()', to make it
>> more readable [1]. I have attempted something like that here, and it has led
>> to a lot more duplicated 'switch' statements, which feels suboptimal.
>> I'd appreciate suggestions to make it more legible.
>>
>> [1] https://lore.kernel.org/git/87r1fps63r.fsf@xxxxxxxxxxxxxxxxxxx/
>>
>> Fetch-it-Via:
>> git fetch https://github.com/tfidfwastaken/git submodule-run-update-proc-list-2
>>
>>  builtin/submodule--helper.c | 253 ++++++++++++++++++++++++++++++++++++
>>  git-submodule.sh            | 106 +++++----------
>>  2 files changed, 286 insertions(+), 73 deletions(-)
>>
>> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
>> index d55f6262e9..b9c40324d0 100644
>> --- a/builtin/submodule--helper.c
>> +++ b/builtin/submodule--helper.c
>> @@ -2029,6 +2029,20 @@ struct submodule_update_clone {
>>         .max_jobs = 1, \
>>  }
>>
>> +struct update_data {
>> +       const char *recursive_prefix;
>> +       const char *sm_path;
>> +       const char *displaypath;
>> +       struct object_id oid;
>> +       struct object_id suboid;
>> +       struct submodule_update_strategy update_strategy;
>> +       int depth;
>> +       unsigned int force: 1;
>> +       unsigned int quiet: 1;
>> +       unsigned int nofetch: 1;
>> +       unsigned int just_cloned: 1;
>> +};
>> +#define UPDATE_DATA_INIT { .update_strategy = SUBMODULE_UPDATE_STRATEGY_INIT }
>>
>>  static void next_submodule_warn_missing(struct submodule_update_clone *suc,
>>                 struct strbuf *out, const char *displaypath)
>> @@ -2282,6 +2296,175 @@ static int git_update_clone_config(const char *var, const char *value,
>>         return 0;
>>  }
>> +
>> +static int run_update_command(struct update_data *ud, int subforce)
>> +{
>> +       struct child_process cp = CHILD_PROCESS_INIT;
>> +       char *oid = oid_to_hex(&ud->oid);
>> +       int must_die_on_failure = 0;
>> +
>> +       cp.dir = xstrdup(ud->sm_path);
>> +       switch (ud->update_strategy.type) {
>> +       case SM_UPDATE_CHECKOUT:
>> +               cp.git_cmd = 1;
>> +               strvec_pushl(&cp.args, "checkout", "-q", NULL);
>
> Would it be possible to add the 'if' statement above just before the
> 'switch' (or after,
> whichever seems okay) since this is common amongst (almost) all the cases?

I'll try it on once, if it makes the code more readable, I'll include it in the
reroll.

>> +               if (subforce)
>> +                       strvec_push(&cp.args, "-f");
>> +               break;
>> +       case SM_UPDATE_REBASE:
>> +               cp.git_cmd = 1;
>> +               strvec_push(&cp.args, "rebase");
>> +               if (ud->quiet)
>> +                       strvec_push(&cp.args, "--quiet");
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_MERGE:
>> +               cp.git_cmd = 1;
>> +               strvec_push(&cp.args, "merge");
>> +               if (ud->quiet)
>> +                       strvec_push(&cp.args, "--quiet");
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_COMMAND:
>> +               /* NOTE: this does not handle quoted arguments */
>> +               strvec_split(&cp.args, ud->update_strategy.command);
>> +               must_die_on_failure = 1;
>> +               break;
>> +       case SM_UPDATE_UNSPECIFIED:
>> +       case SM_UPDATE_NONE:
>> +               BUG("update strategy should have been specified");
>> +       }
>
> If the original did not bug out, do we need to? The documentation does
> not mention
> this as well:
> https://git-scm.com/docs/git-submodule#Documentation/git-submodule.txt-none

This was how the original shell porcelain did it:

case "$update_module" in
checkout)
	command="git checkout $subforce -q"
	die_msg="$(eval_gettext "Unable to checkout '\$sha1' in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': checked out '\$sha1'")"
	;;
rebase)
	command="git rebase ${GIT_QUIET:+--quiet}"
	die_msg="$(eval_gettext "Unable to rebase '\$sha1' in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': rebased into '\$sha1'")"
	must_die_on_failure=yes
	;;
merge)
	command="git merge ${GIT_QUIET:+--quiet}"
	die_msg="$(eval_gettext "Unable to merge '\$sha1' in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': merged in '\$sha1'")"
	must_die_on_failure=yes
	;;
!*)
	command="${update_module#!}"
	die_msg="$(eval_gettext "Execution of '\$command \$sha1' failed in submodule path '\$displaypath'")"
	say_msg="$(eval_gettext "Submodule path '\$displaypath': '\$command \$sha1'")"
	must_die_on_failure=yes
	;;
*)
	die "$(eval_gettext "Invalid update mode '$update_module' for submodule path '$path'")"
esac

The fallthrough case used to die, but I noticed that this branch will never get
activated. This is because the 'update-clone' helper will not output any entry
that has the update mode set to 'none', and thus the `while` loop that contains
this code would never run.

Which is why I decided to BUG out on that case, because that state should never
be reached. But I see the source of confusion, and maybe I should have different
BUG messages for SM_UPDATE_UNSPECIFIED and SM_UPDATE_NONE. The latter should
probably say "should have been handled by update-clone".

>> +
>> +       strvec_push(&cp.args, oid);
>> +
>> +       prepare_submodule_repo_env(&cp.env_array);
>> +
>> +       if (run_command(&cp)) {
>> +               if (must_die_on_failure) {
>> +                       switch (ud->update_strategy.type) {
>> +                       case SM_UPDATE_CHECKOUT:
>> +                               die(_("Unable to checkout '%s' in submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_REBASE:
>> +                               die(_("Unable to rebase '%s' in submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_MERGE:
>> +                               die(_("Unable to merge '%s' in submodule path '%s'"),
>> +                                     oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_COMMAND:
>> +                               die(_("Execution of '%s %s' failed in submodule path '%s'"),
>> +                                     ud->update_strategy.command, oid, ud->displaypath);
>> +                               break;
>> +                       case SM_UPDATE_UNSPECIFIED:
>> +                       case SM_UPDATE_NONE:
>> +                               BUG("update strategy should have been specified");
>> +                       }
>> +               }
>> +               /*
>> +                * This signifies to the caller in shell that
>> +                * the command failed without dying
>> +                */
>> +               return 1;
>> +       }
>> +
>> +       switch (ud->update_strategy.type) {
>> +       case SM_UPDATE_CHECKOUT:
>> +               printf(_("Submodule path '%s': checked out '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_REBASE:
>> +               printf(_("Submodule path '%s': rebased into '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_MERGE:
>> +               printf(_("Submodule path '%s': merged in '%s'\n"),
>> +                      ud->displaypath, oid);
>> +               break;
>> +       case SM_UPDATE_COMMAND:
>> +               printf(_("Submodule path '%s': '%s %s'\n"),
>> +                      ud->displaypath, ud->update_strategy.command, oid);
>> +               break;
>> +       case SM_UPDATE_UNSPECIFIED:
>> +       case SM_UPDATE_NONE:
>> +               BUG("update strategy should have been specified");
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int do_run_update_procedure(struct update_data *ud)
>> +{
>> +       int subforce = is_null_oid(&ud->suboid) || ud->force;
>> +
>> +       if (!ud->nofetch) {
>> +               /*
>> +                * Run fetch only if `oid` isn't present or it
>> +                * is not reachable from a ref.
>> +                */
>> +               if (!is_tip_reachable(ud->sm_path, &ud->oid))
>> +                       if (fetch_in_submodule(ud->sm_path, ud->depth, ud->quiet, NULL) &&
>> +                           !ud->quiet)
>> +                               fprintf_ln(stderr,
>> +                                          _("Unable to fetch in submodule path '%s'; "
>> +                                            "trying to directly fetch %s:"),
>> +                                          ud->displaypath, oid_to_hex(&ud->oid));
>
> I was wondering if an OID is invalid, will it be counted as
> unreachable and vice-versa?
> If that is the case then that would simplify the work.

Could you elaborate? I'm not sure what you mean by 'invalid' in this context. I
don't think this code will receive any kind of malformed oid--they come from
'update-clone' which handles it correctly.

As far as I can tell, the only way to check if a particular OID is unreachable
is when we check if all the refs cannot find it.