On Fri, 10 Nov 2023 at 15:44, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > On Fri, Nov 10, 2023 at 12:58:12PM +0800, Andy Chiu wrote: > > Hi Eric, > > > > On Thu, Nov 9, 2023 at 3:16 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > > > On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote: > > > > On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote: > > > > >> +static int ecb_encrypt(struct skcipher_request *req) > > > > >> +{ > > > > >> + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); > > > > >> + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); > > > > >> + struct skcipher_walk walk; > > > > >> + unsigned int nbytes; > > > > >> + int err; > > > > >> + > > > > >> + /* If we have error here, the `nbytes` will be zero. */ > > > > >> + err = skcipher_walk_virt(&walk, req, false); > > > > >> + while ((nbytes = walk.nbytes)) { > > > > >> + kernel_vector_begin(); > > > > >> + rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr, > > > > >> + nbytes & AES_BLOCK_VALID_SIZE_MASK, > > > > >> + &ctx->key); > > > > >> + kernel_vector_end(); > > > > >> + err = skcipher_walk_done( > > > > >> + &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK); > > > > >> + } > > > > >> + > > > > >> + return err; > > > > >> +} > > > > > > > > > > There's no fallback for !crypto_simd_usable() here. I really like it this way. > > > > > However, for it to work (for skciphers and aeads), RISC-V needs to allow the > > > > > vector registers to be used in softirq context. Is that already the case? > > > > > > > > The kernel-mode-vector could be enabled in softirq, but we don't have nesting > > > > vector contexts. Will we have the case that kernel needs to jump to softirq for > > > > encryptions during the regular crypto function? If yes, we need to have fallbacks > > > > for all algorithms. > > > > > > Are you asking what happens if a softirq is taken while the CPU is between > > > kernel_vector_begin() and kernel_vector_end()? I think that needs to be > > > prevented by making kernel_vector_begin() and kernel_vector_end() disable and > > > re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on > > > arm64. Refer to commit 13150149aa6ded which implemented that behavior on arm64. > > > > Yes, if making Vector available to softirq context is a must, then it > > is reasonable to call local_bh_disable() in kernel_vector_begin(). > > However, softirq would not be the only user for Vector and disabling > > it may cause extra latencies. Meanwhile, simply disabling bh in > > kernel_vector_begin() will conflict with the patch[1] that takes an > > approach to run Preemptible Vector. Though it is not clear yet on > > whether we should run Vector without turning off preemption, I have > > tested running preemptible Vector and observed some latency > > improvements without sacrificing throughput. We will have a discussion > > on LPC2023[2] and it'd be great if you could join or continue to > > discuss it here. > > > > Approaches can be done such as nesting, if running Vector in softirq > > is required. Since it requires extra save/restore on nesting, I think > > we should run some tests to get more performance (latency/throughput) > > figure let the result decide the final direction. For example, we > > could run Vector in either nesting with preempt-V and non-nesting > > without preempt-V and compare the following performance catachristics: > > - System-wide latency impact > > - Latency and throughput of softirq-Vector itself > > The skcipher and aead APIs do indeed need to work in softirq context. > > It's possible to use a fallback, either by falling back to scalar instructions > or by punting the encryption/decryption operation to a workqueue using > crypto/simd.c. However, both approaches have some significant disadvantages. > It was nice that the need for them on arm64 was eliminated by commit > 13150149aa6ded. Note that it's possible to yield the vector unit occasionally, > to keep preemption and softirqs from being disabled for too long. > It is also quite feasible to start out with an implementation of kernel_vector_begin() that preserves all vector registers eagerly in a special per-CPU allocation if the call is made in softirq context (and BUG when called in hardirq/NMI context). This was my initial approach on arm64 too. Assuming that RiSC-V systems with vector units are not flooding the market just yet, this gives you some time to study the issue without the need to implement non-vector fallback crypto algorithms everywhere.