Re: [PATCH v2 2/2] crypto: arm/xor - make vectorized C code Clang-friendly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 29, 2022 at 2:45 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>
> The ARM version of the accelerated XOR routines are simply the 8-way C
> routines passed through the auto-vectorizer with SIMD codegen enabled.
> This used to require GCC version 4.6 at least, but given that 5.1 is now
> the baseline, this check is no longer necessary, and actually
> misidentifies Clang as GCC < 4.6 as Clang defines the GCC major/minor as
> well, but makes no attempt at doing this in a way that conveys feature
> parity with a certain version of GCC (which would not be a great idea in
> the first place).
>
> So let's drop the version check, and make the auto-vectorize pragma
> (which is based on a GCC-specific command line option) GCC-only. Since
> Clang performs SIMD auto-vectorization by default at -O2, no pragma is
> necessary here.
>
> Tested-by: Nathan Chancellor <nathan@xxxxxxxxxx>
> Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>

Thanks for the patch!
Reviewed-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
Link: https://github.com/ClangBuiltLinux/linux/issues/496
Link: https://github.com/ClangBuiltLinux/linux/issues/503

> ---
>  arch/arm/lib/xor-neon.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index b99dd8e1c93f..522510baed49 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -17,17 +17,11 @@ MODULE_LICENSE("GPL");
>  /*
>   * Pull in the reference implementations while instructing GCC (through
>   * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> - * NEON instructions.
> + * NEON instructions. Clang does this by default at O2 so no pragma is
> + * needed.
>   */
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> +#ifdef CONFIG_CC_IS_GCC
>  #pragma GCC optimize "tree-vectorize"
> -#else
> -/*
> - * While older versions of GCC do not generate incorrect code, they fail to
> - * recognize the parallel nature of these functions, and emit plain ARM code,
> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> - */
> -#warning This code requires at least version 4.6 of GCC
>  #endif
>
>  #pragma GCC diagnostic ignored "-Wunused-variable"
> --
> 2.30.2
>


-- 
Thanks,
~Nick Desaulniers



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux