Adrien Schildknecht <adrien+dev@xxxxxxxxxxx> writes: > Add regexp based on the "Shell Command Language" specifications. > Because of the lax syntax of sh, some corner cases may not be > handled properly. > > Signed-off-by: Adrien Schildknecht <adrien+dev@xxxxxxxxxxx> > --- Those of you who helped in the first round of review, any comments, "This round looks good"'s, ...? > +PATTERNS("sh", > + "^([ \t]*(function[ \t]+)?[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\).*)$", > + /* -- */ I do not think it is wrong per-se to try to be as precise as possible, but I wonder if it is sufficient to cheat and make these "what is a word?" expressions a bit looser, by declaring that it is OK if a simpler pattern allows something that are syntactically illegal in shell, as long as it splits valid shell construct correctly. For example: > + "[a-zA-Z0-9_]+" > + "|[-+0-9]+" The first one matches an identifier (e.g. If you have frotz="a b c" and $frotz, two appearances of 'frotz' are matched) and the second one I think is trying to catch possibly signed integers, but the latter also matches 0+1+++2 which is already loose (but I do not think it is a problem). Perhaps it is sufficient to collapse the above into a single "[-+a-zA-Z0-9_$]+"? > + "|[-+*/<>%&^|=!]=|>>=?|<<=?|\\+\\+|--|\\*\\*|&&|\\|\\||\\[\\[|\\]\\]" > + "|>\\||[<>]+&|<>|<<-|;;"), Likewise. I wonder if something like "[-~!@#%^&*+=|;/]+" gives too many false matches. > { "default", NULL, -1, { NULL, 0 } }, > }; > #undef PATTERNS -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html