On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote: > +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20) > +sub init_first_round { > + my $code=<<___; > + # load input > + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} > + @{[vle32_v $V24, $INPUT]} > + > + li $T0, 5 > + # We could simplify the initialization steps if we have `block<=1`. > + blt $LEN32, $T0, 1f > + > + # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses > + # different order of coefficients. We should use`vbrev8` to reverse the > + # data when we use `vgmul`. > + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} > + @{[vbrev8_v $V0, $V28]} > + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} > + @{[vmv_v_i $V16, 0]} > + # v16: [r-IV0, r-IV0, ...] > + @{[vaesz_vs $V16, $V0]} > + > + # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8. > + slli $T0, $LEN32, 2 > + @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]} > + # v2: [`1`, `1`, `1`, `1`, ...] > + @{[vmv_v_i $V2, 1]} > + # v3: [`0`, `1`, `2`, `3`, ...] > + @{[vid_v $V3]} > + @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]} > + # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...] > + @{[vzext_vf2 $V4, $V2]} > + # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...] > + @{[vzext_vf2 $V6, $V3]} > + slli $T0, $LEN32, 1 > + @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]} > + # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...] > + @{[vwsll_vv $V8, $V4, $V6]} > + > + # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16 > + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} > + @{[vbrev8_v $V8, $V8]} > + @{[vgmul_vv $V16, $V8]} > + > + # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28. > + # Reverse the bits order back. > + @{[vbrev8_v $V28, $V16]} This code assumes that '1 << i' fits in 64 bits, for 0 <= i < vl. I think that works out to an implicit assumption that VLEN <= 2048. I.e., AES-XTS encryption/decryption would produce the wrong result on RISC-V implementations with VLEN > 2048. Perhaps it should be explicitly checked that VLEN <= 2048? - Eric