Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

Eric Biggers <ebiggers@xxxxxxxxxx> · Thu, 9 Nov 2023 00:05:49 -0800

On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20)
> +sub init_first_round {
> +    my $code=<<___;
> +    # load input
> +    @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]}
> +    @{[vle32_v $V24, $INPUT]}
> +
> +    li $T0, 5
> +    # We could simplify the initialization steps if we have `block<=1`.
> +    blt $LEN32, $T0, 1f
> +
> +    # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses
> +    # different order of coefficients. We should use`vbrev8` to reverse the
> +    # data when we use `vgmul`.
> +    @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]}
> +    @{[vbrev8_v $V0, $V28]}
> +    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
> +    @{[vmv_v_i $V16, 0]}
> +    # v16: [r-IV0, r-IV0, ...]
> +    @{[vaesz_vs $V16, $V0]}
> +
> +    # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8.
> +    slli $T0, $LEN32, 2
> +    @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]}
> +    # v2: [`1`, `1`, `1`, `1`, ...]
> +    @{[vmv_v_i $V2, 1]}
> +    # v3: [`0`, `1`, `2`, `3`, ...]
> +    @{[vid_v $V3]}
> +    @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]}
> +    # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...]
> +    @{[vzext_vf2 $V4, $V2]}
> +    # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...]
> +    @{[vzext_vf2 $V6, $V3]}
> +    slli $T0, $LEN32, 1
> +    @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]}
> +    # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...]
> +    @{[vwsll_vv $V8, $V4, $V6]}
> +
> +    # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16
> +    @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]}
> +    @{[vbrev8_v $V8, $V8]}
> +    @{[vgmul_vv $V16, $V8]}
> +
> +    # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28.
> +    # Reverse the bits order back.
> +    @{[vbrev8_v $V28, $V16]}

This code assumes that '1 << i' fits in 64 bits, for 0 <= i < vl.

I think that works out to an implicit assumption that VLEN <= 2048.  I.e.,
AES-XTS encryption/decryption would produce the wrong result on RISC-V
implementations with VLEN > 2048.

Perhaps it should be explicitly checked that VLEN <= 2048?

- Eric