Re: [GSoC][PATCH v2] Add a diff driver for JavaScript languages.

Johannes Sixt <j6t@xxxxxxxx> · Sun, 13 Mar 2022 22:54:57 +0100

When you send a new iteration of a patch or patch set, it is customary
on this list to include everyone who took part in the earlier rounds in
the Cc: list.

Am 12.03.22 um 17:48 schrieb xing zhi jiang:
> In the xfunction part that matches normal functions,
> a variable declaration with an assignment of function, the function declaration
> in the class, and also the function is object literal's property[1].
> 
> And in the word regex part, that matches numbers, punctuations, and also the
> JavaScript identifier.
> This part reference the formal ECMA specification[2].
> 
> [1]https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
> [2]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
> 
> Signed-off-by: xing zhi jiang <a97410985new@xxxxxxxxx>
> ---

> diff --git a/userdiff.c b/userdiff.c
> index 8578cb0d12..51bfe4021d 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -168,6 +168,38 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +
> +PATTERNS("javascript",
> +	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
> +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> +	 /* don't match statement */
> +	 "!;\n"
> +	 /* match normal function */
> +	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
> +	 /* match JavaScript variable declaration with a lambda expression */
> +	 "^[\t ]*((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> +	 "(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"

It would help readability if this second line of this regex were
indented because it is a continuation of the first line.

> +	 /* match exports for anonymous fucntion */
> +	 "^(exports\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
> +	 /* match assign function to LHS */
> +	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*)?[\t ]*\\(.*)\n"

This should be written as

	 "^(.*=[\t ]*function[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*)?\\(.*)\n"

Notice that the whitespace after the identifier can only appear when
there is actually an identifier. The point is to reduce the different
matches permitted by the sub-expression "[\t ]*[\t ]*" when there is no
identifier in the text.

Can the keyword function ever be followed by a number? I guess not. Then
[$_[:alpha:]][$_[:alnum:]]* could be reduced to [$_[:alnum:]]+

> +	 /* match normal function in object literal */
> +	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function[\t ].*)\n"
> +	 /* don't match the function in class, which has more than one ident level */
> +	 "!^(\t{2,}|[ ]{5,})\n"
> +	 /* match function in class */
> +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",> +	 /* word regex */
> +	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, DecimalLiteral and its big version */
> +	 "(0[xXoObB])?[0-9a-fA-F][_0-9a-fA-F]*n?"
> +	 /* DecimalLiteral may be float */
> +	 "|(0|[1-9][_0-9]*)?\\.?[0-9][_0-9]*([eE][+-]?[_0-9]+)?"

Having alternatives that begin with an optional part make the regex
evaluation comparatively inefficient. In particular, both alternatives
above match a decimal integer. I suggest to have the first alternative
only for hex, octal, and binary integers, and the second for all decimal
numbers including floatingpoint:

	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and
their big versions */
	 "0[xXoObB][_0-9a-fA-F]+n?"
	 /* DecimalLiteral may be float */
	 "|[0-9][_0-9]*(\\.[_0-9]*|n)?([eE][+-]?[_0-9]+)?"

and if floating point literals can begin with a decimal point, then we
also need

	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"

> +	 /* punctuations */
> +	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
> +	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
> +	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
> +	 /* identifiers */
> +	 "|[$_[:alpha:]][$_[:alnum:]]*"),
>  PATTERNS("markdown",
>  	 "^ {0,3}#{1,6}[ \t].*",
>  	 /* -- */

-- Hannes