Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.

xing zhi jiang <a97410985new@xxxxxxxxx> · Sun, 3 Apr 2022 21:17:49 +0800

Sorry for the late reply. I've been busy the last two weeks.

On Mon, 14 Mar 2022 at 05:54, Johannes Sixt <j6t@xxxxxxxx> wrote:
>
> Am 12.03.22 um 17:48 schrieb xing zhi jiang:
> > diff --git a/userdiff.c b/userdiff.c
> > index 8578cb0d12..51bfe4021d 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -168,6 +168,38 @@ PATTERNS("java",
> >        "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> >        "|[-+*/<>%&^|=!]="
> >        "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> > +
> > +PATTERNS("javascript",
> > +      /* don't match the expression may contain parenthesis, because it is not a function declaration */
> > +      "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> > +      /* don't match statement */
> > +      "!;\n"
> > +      /* match normal function */
> > +      "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
> > +      /* match JavaScript variable declaration with a lambda expression */
> > +      "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> > +      "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
>
> It would help readability if this second line of this regex were
> indented because it is a continuation of the first line.
>
This will be fixed in the v3 patch.

> > +      /* match exports for anonymous fucntion */
> > +      "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
> > +      /* match assign function to LHS */
> > +      "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"
>
> This should be written as
>
>          "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*)?\\(.*)\n"
>
> Notice that the whitespace after the identifier can only appear when
> there is actually an identifier. The point is to reduce the different
> matches permitted by the sub-expression "[\t ]*[\t ]*" when there is no
> identifier in the text.
>
> Can the keyword function ever be followed by a number? I guess not. Then
> [$_[:alpha:]][$_[:alnum:]]* could be reduced to [$_[:alnum:]]+
This will be fixed in the v3 patch.

> > +      /* match normal function in object literal */
> > +      "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
> > +      /* don't match the function in class, which has more than one ident level */
> > +      "!^(\t{2,}|[ ]{5,})\n"
> > +      /* match function in class */
> > +      "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",> +    /* word regex */
> > +      /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */
> > +      "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?"
> > +      /* DecimalLiteral may be float */
> > +      "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
>
> Having alternatives that begin with an optional part make the regex
> evaluation comparatively inefficient. In particular, both alternatives
> above match a decimal integer. I suggest to have the first alternative
> only for hex, octal, and binary integers, and the second for all decimal
> numbers including floatingpoint:
>
>          /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and
> their big versions */
>          "0[xXoObB][_0-9a-fA-F]+n?"
>          /* DecimalLiteral may be float */
>          "|[0-9][_0-9]*(\\.[_0-9]*|n)?([eE][+-]?[_0-9]+)?"
>
> and if floating point literals can begin with a decimal point, then we
> also need
>
>          "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
>
I agree on separate decimal regex from others. And in JavaScript
floating numbers can start with a dot. So need add new regex -
"|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?".
This will be fixed in the v3 patch.