Re: [PATCH v2] userdiff: add built-in pattern for rust

Johannes Sixt <j6t@xxxxxxxx> · Fri, 17 May 2019 08:26:35 +0200

Am 17.05.19 um 01:58 schrieb marcandre.lureau@xxxxxxxxxx:
> From: Marc-André Lureau <mlureau@xxxxxxxxxx>
> 
> This adds xfuncname and word_regex patterns for Rust, a quite
> popular programming language. It also includes test cases for the
> xfuncname regex (t4018) and updated documentation.
> 
> The word_regex pattern finds identifiers, integers, floats and
> operators, according to the Rust Reference Book.
> 
> Cc: Johannes Sixt <j6t@xxxxxxxx>

In this code base, Cc: footers are disliked.

> Signed-off-by: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx>
> ---

> diff --git a/t/t4018/rust-trait b/t/t4018/rust-trait
> new file mode 100644
> index 0000000000..ea397f09ed
> --- /dev/null
> +++ b/t/t4018/rust-trait
> @@ -0,0 +1,5 @@
> +unsafe trait RIGHT<T> {
> +    fn len(&self) -> u32;
> +    fn ChangeMe(&self, n: u32) -> T;
> +    fn iter<F>(&self, f: F) where F: Fn(T);
> +}

You mentioned that 'unsafe' is commonly used for blocks, and these cases
should not be picked up. Can we have a test case that demonstrates that
this is indeed the case?

> diff --git a/userdiff.c b/userdiff.c
> index 3a78fbf504..8d7e62e2a5 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -130,6 +130,13 @@ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
>  	 "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
>  	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
>  	 "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
> +PATTERNS("rust",
> +	 "^[\t ]*((pub(\\([^\\)]+\\))?[\t ]+)?((async|const|unsafe|extern([\t ]+\"[^\"]+\"))[\t ]+)?(struct|enum|union|mod|trait|fn|impl(<.+>)?)[ \t]+[^;]*)$",
> +	 /* -- */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 "|[-+_0-9.eE]+(f32|f64|u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"

This pattern did not change. Doesn't it still mark "+e_1.e_8-e_2.eu128"
as a single word?

> +	 "|0[box]?[0-9a-fA-F_]+(u8|u16|u32|u64|u128|usize|i8|i16|i32|i64|i128|isize)?"

I still think that you should reduce the complexity of these patterns.
They do not have to be restrictive to dismiss wrong syntax, just liberal
enough to catch correct syntax. Let me try again:

	"|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"

> +	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::")
-- Hannes