Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you for this follow-up.

After reviewing my text below I find that the words may sound harsh.
Please don't let you turn down by the lack of friendlyness. In technical
reviews I (and very likely many other reviewers on this list) prefer to
get to the point right away and not waste our time by talking in circles.

I also do not repeat what others have said in their review.

Am 18.02.25 um 16:35 schrieb Moumita:
> From: Moumita Dhar <dhar61595@xxxxxxxxx>
> 
> The existing Bash userdiff pattern misses some shell function forms, such as
> `function foo()`, multi-line definitions, and extra whitespace.
> 
> Extend the pattern to:
> - Support `function foo()` syntax.
> - Allow spaces in `foo ( )` definitions.
> - Recognize multi-line definitions with backslashes.
> - Broaden function body detection.

Please be accurate what you mention here. The first two are already
present and not new, the third one I cannot find, and the last one is
not immediately visible if not wrong (you are removing the '((' introducer).

> 
> Signed-off-by: Moumita Dhar <dhar61595@xxxxxxxxx>
> ---
>  userdiff.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/userdiff.c b/userdiff.c
> index 340c4eb4f7..194e28883d 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -53,26 +53,38 @@ IPATTERN("ada",
>  	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
>  	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
>  PATTERNS("bash",
> -	 /* Optional leading indentation */
> +     /* Optional leading indentation */
>  	 "^[ \t]*"
> -	 /* Start of captured text */
> +	 /* Start of captured function name */
>  	 "("
>  	 "("
> -	     /* POSIX identifier with mandatory parentheses */
> -	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
> +		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
> +		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"
>  	 "|"
> -	     /* Bashism identifier with optional parentheses */
> -	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
> +		 /* Bash-style function definitions, allowing optional `function` keyword */
> +		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"
>  	 ")"

You inserted (?=[a-zA-Z_]) after the function. What does this do? Then
you made 'function identifier' optional. Why?

You also added a ? at the end. Is that to make the parenthesis optional?
But look closely: they are already optional, because we already have

  ( space_opt "(" space_opt ")" | space_opt )

It would be a minor improvement to turn this into

  space_opt ( "(" space_opt ")" )?

Make a mental bookmark here.

>  	 /* Optional whitespace */
>  	 "[ \t]*"
> -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
> -	 "(\\{|\\(\\(?|\\[\\[)"
> -	 /* End of captured text */
> +	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
> +	 "(\\{|\\(|\\[\\[)"
> +	 /* End of captured function name */

What is the justification that you removed "((" as introduction of the
function body?

>  	 ")",

So, we capture only what we expect to be a function header in typical cases.

Let me make a suggestion. What if we replaced everything from the mental
bookmark above up to the closing parenthesis by .*$ like we do in all
other builtin drivers? Then everything on the line that contains the
function header would be used as hunk header, and we do not care what
the function body looks like. It would also cover the case mentioned by
Eric elsewhere where the body is just a simple command.

If that leads to new cases that are detected, it would be really nice to
add corresponding test cases in t/t4018/.

>  	 /* -- */
> -	 /* Characters not in the default $IFS value */
> -	 "[^ \t]+"),
> +	 /* Identifiers: variable and function names */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 /* Numeric constants: integers and decimals */
> +	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
> +	 /* Shell variables: `$VAR`, `${VAR}` */
> +	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
> +	 /* Logical and comparison operators */
> +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
> +	 /* Assignment and arithmetic operators */
> +	 "|[-+*/%&|^!=<>]=?"
> +	 /* Command-line options (to avoid splitting `-option`) */
> +	 "|--?[a-zA-Z0-9_-]+"
> +	 /* Brackets and grouping symbols */
> +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),

As far as I can see, you introduced a pattern for options, but otherwise
left the patterns the same as in the earlier round. Of course, you are
not obliged to integrate every suggestion that is made by a reviewer,
but it is good tone that you leave a comment explaining why you
dismissed a suggestion.

To expand on my suggestion how to treat ${var} substitution:

- Replace \\$\\{[^}]+\\} by just \\$\\{ to match only the introducer.
(The closing brace is already matched by a later pattern.)
- Add operators := :- :+ :? # ## %% that are used frequently in these
complex expansions.
- Have a look at section "Shell Parameter Expansion" in the bash
manual[1] about the multitude of operators that can occur after ${.
Maybe you want to add more of them.

[1]
https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion

-- Hannes





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux