Re: [PATCH riscv/for-next] crypto: riscv - parallelize AES-CBC decryption

Jerry Shih <jerry.shih@xxxxxxxxxx> · Sat, 10 Feb 2024 23:25:27 +0800

On Feb 8, 2024, at 14:08, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> From: Eric Biggers <ebiggers@xxxxxxxxxx>
> 
> Since CBC decryption is parallelizable, make the RISC-V implementation
> of AES-CBC decryption process multiple blocks at a time, instead of
> processing the blocks one by one.  This should improve performance.
> 
> Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx>
> ---
> arch/riscv/crypto/aes-riscv64-zvkned.S | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.S b/arch/riscv/crypto/aes-riscv64-zvkned.S
> index 78d4e1186c074..43541aad6386c 100644
> --- a/arch/riscv/crypto/aes-riscv64-zvkned.S
> +++ b/arch/riscv/crypto/aes-riscv64-zvkned.S
> @@ -132,33 +132,39 @@ SYM_FUNC_END(aes_ecb_decrypt_zvkned)
> 	addi		INP, INP, 16
> 	addi		OUTP, OUTP, 16
> 	addi		LEN, LEN, -16
> 	bnez		LEN, 1b
> 
> 	vse32.v		v16, (IVP)	// Store next IV
> 	ret
> .endm
> 
> .macro	aes_cbc_decrypt	keylen
> +	srli		LEN, LEN, 2	// Convert LEN from bytes to words
> 	vle32.v		v16, (IVP)	// Load IV
> 1:
> -	vle32.v		v17, (INP)	// Load ciphertext block
> -	vmv.v.v		v18, v17	// Save ciphertext block
> -	aes_decrypt	v17, \keylen	// Decrypt
> -	vxor.vv		v17, v17, v16	// XOR with IV or prev ciphertext block
> -	vse32.v		v17, (OUTP)	// Store plaintext block
> -	vmv.v.v		v16, v18	// Next "IV" is prev ciphertext block
> -	addi		INP, INP, 16
> -	addi		OUTP, OUTP, 16
> -	addi		LEN, LEN, -16
> +	vsetvli		t0, LEN, e32, m4, ta, ma
> +	vle32.v		v20, (INP)	// Load ciphertext blocks
> +	vslideup.vi	v16, v20, 4	// Setup prev ciphertext blocks
> +	addi		t1, t0, -4
> +	vslidedown.vx	v24, v20, t1	// Save last ciphertext block

Do we need to setup the `e32, len=t0` for next IV?
I think we only need 128bit IV (with VL=4).

> +	aes_decrypt	v20, \keylen	// Decrypt the blocks
> +	vxor.vv		v20, v20, v16	// XOR with prev ciphertext blocks
> +	vse32.v		v20, (OUTP)	// Store plaintext blocks
> +	vmv.v.v		v16, v24	// Next "IV" is last ciphertext block

Same VL issue here.

> +	slli		t1, t0, 2	// Words to bytes
> +	add		INP, INP, t1
> +	add		OUTP, OUTP, t1
> +	sub		LEN, LEN, t0
> 	bnez		LEN, 1b
> 
> +	vsetivli	zero, 4, e32, m1, ta, ma
> 	vse32.v		v16, (IVP)	// Store next IV
> 	ret
> .endm
> 
> // void aes_cbc_encrypt_zvkned(const struct crypto_aes_ctx *key,
> //			       const u8 *in, u8 *out, size_t len, u8 iv[16]);
> //
> // |len| must be nonzero and a multiple of 16 (AES_BLOCK_SIZE).
> SYM_FUNC_START(aes_cbc_encrypt_zvkned)
> 	aes_begin	KEYP, 128f, 192f
> 
> base-commit: cb4ede926134a65bc3bf90ed58dace8451d7e759
> -- 
> 2.43.0
>