Re: [PATCH v2 02/20] crypto: x86/chacha - expose SIMD ChaCha routine as library function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 02, 2019 at 04:16:55PM +0200, Ard Biesheuvel wrote:
> Wire the existing x86 SIMD ChaCha code into the new ChaCha library
> interface.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> ---
>  arch/x86/crypto/chacha_glue.c | 36 ++++++++++++++++++++
>  crypto/Kconfig                |  1 +
>  include/crypto/chacha.h       |  6 ++++
>  3 files changed, 43 insertions(+)
> 
> diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c
> index bc62daa8dafd..fd9ef42842cf 100644
> --- a/arch/x86/crypto/chacha_glue.c
> +++ b/arch/x86/crypto/chacha_glue.c
> @@ -123,6 +123,42 @@ static void chacha_dosimd(u32 *state, u8 *dst, const u8 *src,
>  	}
>  }
>  
> +void hchacha_block(const u32 *state, u32 *stream, int nrounds)
> +{
> +	state = PTR_ALIGN(state, CHACHA_STATE_ALIGN);
> +
> +	if (!crypto_simd_usable()) {
> +		hchacha_block_generic(state, stream, nrounds);
> +	} else {
> +		kernel_fpu_begin();
> +		hchacha_block_ssse3(state, stream, nrounds);
> +		kernel_fpu_end();
> +	}
> +}
> +EXPORT_SYMBOL(hchacha_block);

Please correct me if I'm wrong:

The approach here is slightly different from Zinc. In Zinc, I had one
entry point that conditionally called into the architecture-specific
implementation, and I did it inline using #includes so that in some
cases it could be optimized out.

Here, you override the original symbol defined by the generic module
from the architecture-specific implementation, and in there you decide
which way to branch.

Your approach has the advantage that you don't need to #include a .c
file like I did, an ugly yet very effective approach.

But it has two disadvantages:

1. For architecture-specific code that _always_ runs, such as the
  MIPS32r2 implementation of chacha, the compiler no longer has an
  opportunity to remove the generic code entirely from the binary,
  which under Zinc resulted in a smaller module.

2. The inliner can't make optimizations for that call.

Disadvantage (2) might not make much of a difference. Disadvantage (1)
seems like a bigger deal. However, perhaps the linker is smart and can
remove the code and symbol? Or if not, is there a way to make the linker
smart? Or would all this require crazy LTO which isn't going to happen
any time soon?



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux