Re: [PATCH v2] userdiff: funcname and word patterns for sh

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 13 Mar 2015 22:13:09 -0700

Adrien Schildknecht <adrien+dev@xxxxxxxxxxx> writes:

> Add regexp based on the "Shell Command Language" specifications.
> Because of the lax syntax of sh, some corner cases may not be
> handled properly.
>
> Signed-off-by: Adrien Schildknecht <adrien+dev@xxxxxxxxxxx>
> ---

Those of you who helped in the first round of review, any comments,
"This round looks good"'s, ...?

> +PATTERNS("sh",
> +	"^([ \t]*(function[ \t]+)?[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\).*)$",
> +	/* -- */

I do not think it is wrong per-se to try to be as precise as
possible, but I wonder if it is sufficient to cheat and make these
"what is a word?" expressions a bit looser, by declaring that it is
OK if a simpler pattern allows something that are syntactically
illegal in shell, as long as it splits valid shell construct
correctly.  For example:

> +	 "[a-zA-Z0-9_]+"
> +	 "|[-+0-9]+"

The first one matches an identifier (e.g. If you have frotz="a b c"
and $frotz, two appearances of 'frotz' are matched) and the second
one I think is trying to catch possibly signed integers, but the
latter also matches 0+1+++2 which is already loose (but I do not
think it is a problem).  Perhaps it is sufficient to collapse the
above into a single "[-+a-zA-Z0-9_$]+"?

> +	 "|[-+*/<>%&^|=!]=|>>=?|<<=?|\\+\\+|--|\\*\\*|&&|\\|\\||\\[\\[|\\]\\]"
> +	 "|>\\||[<>]+&|<>|<<-|;;"),

Likewise.  I wonder if something like "[-~!@#%^&*+=|;/]+" gives too
many false matches.

>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html