Sorry for the late reply. I've been busy the last two weeks. On Mon, 14 Mar 2022 at 05:54, Johannes Sixt <j6t@xxxxxxxx> wrote: > > Am 12.03.22 um 17:48 schrieb xing zhi jiang: > > diff --git a/userdiff.c b/userdiff.c > > index 8578cb0d12..51bfe4021d 100644 > > --- a/userdiff.c > > +++ b/userdiff.c > > @@ -168,6 +168,38 @@ PATTERNS("java", > > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?" > > "|[-+*/<>%&^|=!]=" > > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"), > > + > > +PATTERNS("javascript", > > + /* don't match the expression may contain parenthesis, because it is not a function declaration */ > > + "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n" > > + /* don't match statement */ > > + "!;\n" > > + /* match normal function */ > > + "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n" > > + /* match JavaScript variable declaration with a lambda expression */ > > + "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*" > > + "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n" > > It would help readability if this second line of this regex were > indented because it is a continuation of the first line. > This will be fixed in the v3 patch. > > + /* match exports for anonymous fucntion */ > > + "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n" > > + /* match assign function to LHS */ > > + "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n" > > This should be written as > > "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*)?\\(.*)\n" > > Notice that the whitespace after the identifier can only appear when > there is actually an identifier. The point is to reduce the different > matches permitted by the sub-expression "[\t ]*[\t ]*" when there is no > identifier in the text. > > Can the keyword function ever be followed by a number? I guess not. Then > [$_[:alpha:]][$_[:alnum:]]* could be reduced to [$_[:alnum:]]+ This will be fixed in the v3 patch. > > + /* match normal function in object literal */ > > + "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n" > > + /* don't match the function in class, which has more than one ident level */ > > + "!^(\t{2,}|[ ]{5,})\n" > > + /* match function in class */ > > + "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",> + /* word regex */ > > + /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */ > > + "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?" > > + /* DecimalLiteral may be float */ > > + "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?" > > Having alternatives that begin with an optional part make the regex > evaluation comparatively inefficient. In particular, both alternatives > above match a decimal integer. I suggest to have the first alternative > only for hex, octal, and binary integers, and the second for all decimal > numbers including floatingpoint: > > /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and > their big versions */ > "0[xXoObB][_0-9a-fA-F]+n?" > /* DecimalLiteral may be float */ > "|[0-9][_0-9]*(\\.[_0-9]*|n)?([eE][+-]?[_0-9]+)?" > > and if floating point literals can begin with a decimal point, then we > also need > > "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?" > I agree on separate decimal regex from others. And in JavaScript floating numbers can start with a dot. So need add new regex - "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?". This will be fixed in the v3 patch.