On Fri, 6 Oct 2023 at 23:01, He-Jie Shih <bignose1007@xxxxxxxxx> wrote: > > On Oct 7, 2023, at 03:47, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > On Fri, Sep 15, 2023 at 11:21:28AM +0800, Jerry Shih wrote: > >> On Sep 15, 2023, at 09:48, He-Jie Shih <bignose1007@xxxxxxxxx> wrote: > >> The OpenSSL PR is at [1]. > >> And we are from SiFive. > >> > >> -Jerry > >> > >> [1] > >> https://github.com/openssl/openssl/pull/21923 > > > > Hi Jerry, I'm wondering if you have an update on this? Do you need any help? > > We have specialized aes-cbc/ecb/ctr patch locally and pass the `testmgr` test > cases. But the test patterns in `testmgr` are quite simple, I think it doesn't test the > corner case(e.g. aes-xts with tail element). > There should be test cases for that. > For aes-xts, I'm trying to update the implementation from OpenSSL. The design > philosophy is different between OpenSSL and linux. In linux crypto, the data will > be split into `scatterlist`. I need to preserve the aes-xts's iv for each scatterlist > entry call. Yes, this applies to all block ciphers that take an IV. > And I'm thinking about how to handle the tail data in a simple way. The RISC-V vector ISA is quite advanced, so there may be a better trick using predicates etc but otherwise, I suppose you could reuse the same trick that other asm implementations use, which is to use unaligned loads and stores for the final blocks, and to use a vector permute with a permute table to shift the bytes in the registers. But this is not performance critical, given that existing in-kernel users use sector or page size inputs only. > By the way, the `xts(aes)` implementation for arm and x86 are using > `cra_blocksize= AES_BLOCK_SIZE`. I don't know why we need to handle the tail > element. I think we will hit `EINVAL` error in `skcipher_walk_next()` if the data size > it not be a multiple of block size. > No, both XTS and CBC-CTS permit inputs that are not a multiple of the block size, and will use some form of ciphertext stealing for the final tail. There is a generic CTS template that wraps CBC but combining them in the same way (e.g., using vector permute) will speed up things substantially. *However*, I'm not sure how relevant CBC-CTS is in the kernel, given that only fscrypt uses it IIRC but actually prefers something else so for new systems perhaps you shouldn't bother. > Overall, we will have > 1) aes cipher > 2) aes with cbc/ecb/ctr/xts mode > 3) sha256/512 for `vlen>=128` platform > 4) sm3 for `vlen>=128` platform > 5) sm4 > 6) ghash > 7) `chacha20` stream cipher > > The vector crypto pr in OpenSSL is under reviewing, we are still updating the > perl file into linux. > > The most complicated `gcm(aes)` mode will be in our next plan. > > > I'm also wondering about riscv.pm and the choice of generating the crypto > > instructions from .words instead of using the assembler. It makes it > > significantly harder to review the code, IMO. Can we depend on assembler > > support for these instructions, or is that just not ready yet? > > I have asked the same question before[1]. The reason is that Openssl could use > very old compiler for compiling. Thus, the assembler might not know the standard > rvv 1.0[2] and other vector crypto[3] instructions. That's why we use opcode for all > vector instructions. IMO, I would prefer to use opcode for `vector crypto` only. The > gcc-12 and clang-14 are already supporting rvv 1.0. Actually, I just read the `perl` > file instead of the actually generated opcode for OpenSSL pr reviewing. And it's > not hard to read the perl code. > I understand the desire to reuse code, and OpenSSL already relies on so-called perlasm for this, but I think this is not a great choice, and I actually think this was a mistake for RISC-V. OpenSSL relies on perlasm for things like emittting different function pro-/epilogues depending on the calling convention (SysV versus MS on x86_64, for instance), but RISC-V does not have that much variety, and already supports the insn_r / insn_i pseudo instructions to emit arbitrary opcodes while still supporting named registers as usual. [Maybe my experience does not quite extrapolate to the vector ISA, but I managed to implement scalar AES [0] using the insn_r and insn_i pseudo instructions (which are generally provided by the assembler but Linux has fallback CPP macros for them as well), and this results on much more maintainable code IMO.[ We are using some of the OpenSSL perlasm in the kernel already (and some of it was introduced by me) but I don't think we should blindly reuse all of the RISC-V code if some of it can straight-forwardly be written as normal .S files. [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=riscv-scalar-aes