Re: [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper

Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> · Tue, 09 Apr 2024 15:38:16 -0700



On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@xxxxxxxxxx>
> 
> Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it.  This is necessary to avoid reducing the
> performance of SSE code.
> 
> Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
> Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx>
> ---
>  arch/x86/crypto/sha256-avx2-asm.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
> index 9918212faf91..0ffb072be956 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
>  	popq	%r15
>  	popq	%r14
>  	popq	%r13
>  	popq	%r12
>  	popq	%rbx
> +	vzeroupper
>  	RET
>  SYM_FUNC_END(sha256_transform_rorx)
>  
>  .section	.rodata.cst512.K256, "aM", @progbits, 512
>  .align 64

Acked-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>