Re: [GSoC][PATCH v3] Add a diff driver for JavaScript languages.

Johannes Sixt <j6t@xxxxxxxx> · Sun, 3 Apr 2022 16:40:21 +0200

Am 03.04.22 um 15:25 schrieb xing zhi jiang:
> In the xfunction part that matches normal functions,
> a variable declaration with an assignment of function, the function declaration
> in the class, named export for function in ECMA2015, CommonJS exports for
> named function, and also the function is object literal's property[1].
> The special match is this would match the popular framework test case
> pattern(most is same as regex for matching function in class).
> 
> And in the word regex part, that matches numbers, punctuations, and also the
> JavaScript identifier.
> This part reference the formal ECMA specification[2].
> 
> [1]https://github.com/jquery/jquery/blob/de5398a6ad088dc006b46c6a870a2a053f4cd663/src/core.js#L201
> [2]https://262.ecma-international.org/12.0/#sec-ecmascript-language-lexical-grammar
> 
> Signed-off-by: xing zhi jiang <a97410985new@xxxxxxxxx>
> ---
> I write a simple python script to help me observe the diff's results easier. 
> And the result is in https://github.com/a97410985/user_diff_observe_tool.
> I observe the six popular JavaScript projects - Jquery, NodeJS, React, TensorflowJS, 
> Vue, and Webpack. The result is not perfect but acceptable.
> Most of the mismatch is related to the function in class. Ex:
> https://github.com/a97410985/user_diff_observe_tool/blob/092bd14c849b9a3f61bb872d730e38b58cc9c303/diff_result/jquery_result.diff#L1094.
> But this is inevitable.
> And besides, apply the code reviews suggestions. I add some new regex.
> 1. "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
>    this is for the case below
>    `
>     function RIGHT(a, b) {
> 
>         WRONG(spy.calls.count()).toBe(1)
>         return a + b; // ChangeMe
>     }
>    `
>    if we don't skip chained function call. This would happen mismatch.

OK, but see below.

> 2. "^[\t ]*(QUnit.test\\(.*)\n"
>    I oberve there are many test case pattern in JavaScript. And most of popular JavaScript
>    unit test framework's test case pattern is function call can be matched by the regex, which
>    match "function in class". So it is no problem. But `QUnit framework` don't matched by that.
>    Then add this regex into JavaScript userdiff.
>    And the reason includes test case pattern has two
>    1. This is inevitable. Because we can't distinguish that with function in class.
>    2. This can help the user understand what change is happening in which test case. And if 
>       don't match the test case, it has the large possibility to match no related function.

Good. I consider this like matching a language keyword. If additional
patterns of this kind occur, they can be added later, but we must make
sure that this does not proliferate.

>       
> And I move the regex, which match normal function in object literal, under the regex 
> ""!^(\t{2,}|[ ]{5,})\n".
> Because I found it would many mismatch if function in object literal has too many indent level(
> the function body usually just few lines of code, and the below has large of code. If has change
> on below, it would happen mismatch).
> And there are two example for hunk header that match test case pattern.

OK.

> diff --git a/t/t4018/javascript-skip-chained-function b/t/t4018/javascript-skip-chained-function
> new file mode 100644
> index 0000000000..4ce23901cf
> --- /dev/null
> +++ b/t/t4018/javascript-skip-chained-function
> @@ -0,0 +1,5 @@
> +function RIGHT(a, b) {
> +
> +  expect(spy.calls.count()).toBe(1)
> +  return a + b; // ChangeMe
> +}

This is the test case that attempts to make sure that the chained
function calls are not matched. But since there is no line between
ChangeMe and the not-to-be-matched line, this test case would not detect
such an incorrect match. You have to insert a line between expect... and
return...ChangeMe

> diff --git a/userdiff.c b/userdiff.c
> index 8578cb0d12..306286df35 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -168,6 +168,43 @@ PATTERNS("java",
>  	 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
>  	 "|[-+*/<>%&^|=!]="
>  	 "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> +
> +PATTERNS("javascript",
> +	 /* don't match the expression may contain parenthesis, because it is not a function declaration */
> +	 "!^[ \t]*(if|do|while|for|with|switch|catch|import|return)\n"
> +	 /* don't match statement */
> +	 "!;\n"
> +	 /* match normal function or named export for function in ECMA2015 */
> +	 "^((export[\t ]+)?(async[\t ]+)?function[\t ]*[\t *]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)\n"
> +	 /* match JavaScript variable declaration with a lambda expression at top level */
> +	 "^((const|let|var)[\t ]*[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*"
> +		"(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>[\t ]*\\{?)\n"
> +	 /* match object's property assignment by anonymous function and CommonJS exports for named function */
> +	 "^((module\\.)?[$_[:alpha:]][$_[:alnum:]]*\\.[$_[:alpha:]][$_[:alnum:]]*[\t ]*=[\t ]*(async[\t ]+)?(\\(.*\\)|[$_[:alpha:]][$_[:alnum:]]*)[\t ]*=>.*)\n"
> +	 /* match assign function to LHS with explicit function keyword */
> +	 "^(.*=[\t ]*function[\t ]*([$_[:alnum:]]+[\t ]*)?\\(.*)\n"
> +	 /* popular unit testing framework test case pattern. Most of framework pattern is match by regex for "function in class" */
> +	 "^[\t ]*(QUnit.test\\(.*)\n"
> +	 /* don't match the function in class or in object literal, which has more than one ident level */
> +	 "!^(\t{2,}|[ ]{5,})\n"
> +	 /* match normal function in object literal */
> +	 "^[\t ]*([$_[:alpha:]][$_[:alnum:]]*[\t ]*:[\t ]*function.*)\n"
> +	 /* don't match chained method call */
> +	 "!^[\t ]*[$_[:alpha:]][$_[:alnum:]][\t ]*\\(.*\\)\\.\n"
> +	 /* match function in class and ES5 method shorthand */
> +	 "^[\t ]*((static[\t ]+)?((async|get|set)[\t ]+)?[$_[:alpha:]][$_[:alnum:]]*[\t ]*\\(.*)",

No objections to these expressions.

> +	 /* word regex */
> +	 /* hexIntegerLiteral, octalIntegerLiteral, binaryIntegerLiteral, and its big version */
> +	 "0[xXoObB][_0-9a-fA-F]+n?"
> +	 /* DecimalLiteral and its big version*/
> +	 "|[0-9][_0-9]*(\\.[0-9][_0-9]*|n)?([eE][+-]?[_0-9]+)?"
> +	 "|\\.[0-9][_0-9]*([eE][+-]?[_0-9]+)?"
> +	 /* punctuations */
> +	 "|\\.{3}|<=|>=|==|!=|={3}|!==|\\*{2}|\\+{2}|--|<<|>>"
> +	 "|>>>|&&|\\|{2}|\\?{2}|\\+=|-=|\\*=|%=|\\*{2}="
> +	 "|<<=|>>=|>>>=|&=|\\|=|\\^=|&&=|\\|{2}=|\\?{2}=|=>"
> +	 /* identifiers */
> +	 "|[$_[:alpha:]][$_[:alnum:]]*"),

These word-regexs look good.

>  PATTERNS("markdown",
>  	 "^ {0,3}#{1,6}[ \t].*",
>  	 /* -- */

This round looks quite good. With the one issue I noted above fixed,
this should be good to go, IMO.

-- Hannes