Re: [PATCH v3] [GSOC]trailer: pass arg as positional parameter

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 25 Mar 2021 15:28:12 -0700

"ZheNing Hu via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: ZheNing Hu <adlternative@xxxxxxxxx>
>
> Original implementation of `trailer.<token>.command` use

uses

> `strbuf_replace` to replace $ARG in command with the <value>
> of the trailer, but it have a problem: `strbuf_replace`

has

> replace the $ARG only once, If the user's trailer command

replaces the $ARG only once.

> have used more than one $ARG, the remaining replacement will
> fail.

"will fail" is quite vague.  It is just left unreplaced, and if the
user expects all of them to be replaced, then the expectation and
reality would not match, but all of that you have already said by
"replaces the $ARG only once.", so I think this sentence should be
removed.

> If directly modify the implementation of the original
> `trailer.<token>.command`, The user’s previous `'$ARG'` in
> trailer command will not be replaced.

That statement does not make much sense.  Depending on the way how
the implementation is "directly" modified, you can fix the "replaces
only once" problem without introducing such a problem.  Just look
for '$ARG' in the string and replace all of them, not just the first
one.  It's not too difficult.

This confusion primarily comes from the fact that you forgot to
explain the other problem you are fixing, I think.  Even though the
trailer.<token>.command documentation implies that the user is
expected to give a shell script or some sort as the command and the
use of $ARG makes it look like a shell variable, the original
implementation does not treat it as a shell variable at all.  And
the textual replacement is done without making sure the value being
replaced has characters with special meaning in the shell language.

So existing .command may incorrectly use $ARG inside a single-quote
pair and expect it to be replaced to a string inside a single-quote
pair.  A malformed, or worse, malicious, value would escape out of
the single-quote pair (remember, the '; rm -fr .' example?) and
execute arbitrary code in an unexpected way.  The (ungrammatical)
"if directly modify the implementation" refers to a potential way to
fix these two problems at the same time by doing the $ARG thing
without using textual replacement, namely, exporting the value as an
environment variable ARG.  If that approach was taken, then, $ARG
enclosed in a single-quote pair will no longer be replaced, which
makes it a backward incompatible change.

But without describing what solution you are talking about, your
three-line description does not make much sense.

> So now add new config
> "trailer.<token>.cmd", pass trailer's value as positional
> parameter 1 to the user's command, the user can use $1 as
> trailer's value, to implement original variable replacement.
>
> If the user has these two configuration: "trailer.<token>.cmd"
> and "trailer.<token>.command", "cmd" will execute and "command"
> will not executed.
>
> Original `trailer.<token>.command` can still be used until git
> completely abandoned it.
>
> Signed-off-by: ZheNing Hu <adlternative@xxxxxxxxx>

Let's rewrite it completely.

	The `trailer.<token>.command` configuration variable
	specifies a command (run via the shall, so it does not have
	to be a single name of or path to the command, but can be a
	shell script), and the first occurrence of substring $ARG is
	replaced with the value given to the `interpret-trailer`
	command for the token.  This has two downsides:

	* The use of $ARG in the mechanism misleads the users that
          the value is passed in the shell variable, and tempt them
          to use $ARG more than once, but that would not work, as
          the second and subsequent $ARG are not replaced.

	* Because $ARG is textually replaced without regard to the
          shell language syntax, even '$ARG' (inside a single-quote
          pair), which a user would expect to stay intact, would be
          replaced, and worse, if the value had an unmatching single
          quote (imagine a name like "O'Connor", substituted into
          NAME='$ARG' to make it NAME='O'Connor), it would result in
          a broken command that is not syntactically correct (or
          worse).

	Introduce a new `trailer.<token>.cmd` configuration that
	takes higher precedence to deprecate and eventually remove
	`trailer.<token>.command`, which passes the value as a
	parameter to the command.  Instead of "$ARG", the users will
	refer to the value as positional argument, $1, in their
	scripts.

I tried to cover everything we need to tell the reviewers about this
change with the above.  Did I miss anything?

> +trailer.<token>.cmd::
> +	Similar to 'trailer.<token>.command'. But the difference is that
> +	`$1` is used in the command to replace the value of the trailer
> +	instead of the original `$ARG`, which means that we can pass the
> +	trailer value multiple times in the command.

We eventually want to deprecate the .command, so we'd prefer not to
rely on its description too much (e.g. try to find a way to say what
you want to say without "instead of the original `$ARG`").

	The command specified by this configuration variable is run
	with a single parameter, which is the <value> part of an
	existing trailer with the same <token>.  The output from the
	command is then used as the value for the <token> in the
	resulting trailer.

would be the replacement for the part that talks about $ARG in the
description of trailer.<token>.command.

The original description for `trailer.<token>.command` is so jumbled
(not your failure at all) that I had a hard time to understand what
it is trying to say (e.g. what does "as if a special <token>=<value>
argument were added at the beginning of the command line" mean?  Is
it making a one-shot export of environment variable to run the
command???), so the above may need further adjustment.  Christian?
Care to help out?

> +	E.g. `git config trailer.sign.cmd "test -n \"$1\" && echo \"$1\" || true "`.

An example is good.  There is a whole EXAMPLES section in this
manual page, and the one that uses $ARG may be a good candidate to
look at and change to use .cmd (instead of .command).

> +	If the user has these two configuration: "trailer.<token>.cmd"
> +	and "trailer.<token>.command", "cmd" will be executed and "command"
> +	will not be executed.

	When both .cmd and .command are given for the same <token>,
	.cmd is used and .command is ignored.

> diff --git a/trailer.c b/trailer.c
> index be4e9726421c..634d3f1ff04a 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -14,6 +14,7 @@ struct conf_info {
>  	char *name;
>  	char *key;
>  	char *command;
> +	char *cmd;
>  	enum trailer_where where;
>  	enum trailer_if_exists if_exists;
>  	enum trailer_if_missing if_missing;
> @@ -127,6 +128,7 @@ static void free_arg_item(struct arg_item *item)
>  	free(item->conf.name);
>  	free(item->conf.key);
>  	free(item->conf.command);
> +	free(item->conf.cmd);
>  	free(item->token);
>  	free(item->value);
>  	free(item);
> @@ -216,18 +218,24 @@ static int check_if_different(struct trailer_item *in_tok,
>  	return 1;
>  }
>  
> -static char *apply_command(const char *command, const char *arg)
> +static char *apply_command(const char *command, const char *cmd_, const char *arg)
>  {
>  	struct strbuf cmd = STRBUF_INIT;
>  	struct strbuf buf = STRBUF_INIT;
>  	struct child_process cp = CHILD_PROCESS_INIT;
>  	char *result;
>  
> -	strbuf_addstr(&cmd, command);
> -	if (arg)
> -		strbuf_replace(&cmd, TRAILER_ARG_STRING, arg);
> -
> -	strvec_push(&cp.args, cmd.buf);
> +	if (cmd_) {
> +		strbuf_addstr(&cmd, cmd_);
> +		strvec_push(&cp.args, cmd.buf);
> +		if (arg)
> +			strvec_push(&cp.args, arg);
> +	} else if (command) {
> +		strbuf_addstr(&cmd, command);
> +		strvec_push(&cp.args, cmd.buf);
> +		if (arg)
> +			strbuf_replace(&cmd, TRAILER_ARG_STRING, arg);
> +	}

OK.  it is clear cmd_ takes precedence this way.

Later (not as part of this patch, but a few releases down the road),
we may want to add a warning() about using a deprecated feature when
"else if (command)" block is taken.

> @@ -247,7 +255,7 @@ static char *apply_command(const char *command, const char *arg)
>  
>  static void apply_item_command(struct trailer_item *in_tok, struct arg_item *arg_tok)
>  {
> -	if (arg_tok->conf.command) {
> +	if (arg_tok->conf.command || arg_tok->conf.cmd) {
>  		const char *arg;
>  		if (arg_tok->value && arg_tok->value[0]) {
>  			arg = arg_tok->value;
> @@ -257,7 +265,7 @@ static void apply_item_command(struct trailer_item *in_tok, struct arg_item *arg
>  			else
>  				arg = xstrdup("");
>  		}
> -		arg_tok->value = apply_command(arg_tok->conf.command, arg);
> +		arg_tok->value = apply_command(arg_tok->conf.command, arg_tok->conf.cmd, arg);

It might be cleaner to just pass arg_tok->conf to apply_command()
and hide "cmd takes precedence over command" as an implementation
detail of that helper function.

The implementation looks as good as the original "command" with that
change at this point.  Documentation may need a bit more polishing.

Thanks.