On Mon, Oct 17, 2022 at 03:26:20PM -0700, Nathan Huckleberry wrote: > The key_powers array is not guaranteed to be 16-byte aligned, so using > movaps to operate on key_powers is not allowed. > > Switch movaps to movups. > > Fixes: 34f7f6c30112 ("crypto: x86/polyval - Add PCLMULQDQ accelerated implementation of POLYVAL") > Reported-by: Bruno Goncalves <bgoncalv@xxxxxxxxxx> > Signed-off-by: Nathan Huckleberry <nhuck@xxxxxxxxxx> > --- > arch/x86/crypto/polyval-clmulni_asm.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/crypto/polyval-clmulni_asm.S b/arch/x86/crypto/polyval-clmulni_asm.S > index a6ebe4e7dd2b..32b98cb53ddf 100644 > --- a/arch/x86/crypto/polyval-clmulni_asm.S > +++ b/arch/x86/crypto/polyval-clmulni_asm.S > @@ -234,7 +234,7 @@ > > movups (MSG), %xmm0 > pxor SUM, %xmm0 > - movaps (KEY_POWERS), %xmm1 > + movups (KEY_POWERS), %xmm1 > schoolbook1_noload > dec BLOCKS_LEFT > addq $16, MSG I thought that crypto_tfm::__crt_ctx is guaranteed to be 16-byte aligned, and that the x86 AES code relies on that property. But now I see that actually the x86 AES code manually aligns the context. See aes_ctx() in arch/x86/crypto/aesni-intel_glue.c. Did you consider doing the same for polyval? If you do prefer this way, it would be helpful to leave a comment for schoolbook1_iteration that mentions that the unaligned access support of vpclmulqdq is being relied on, i.e. pclmulqdq wouldn't work. - Eric