Re: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations

Ard Biesheuvel <ardb@xxxxxxxxxx> · Sat, 11 Nov 2023 21:08:31 +1000

On Fri, 10 Nov 2023 at 15:44, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
>
> On Fri, Nov 10, 2023 at 12:58:12PM +0800, Andy Chiu wrote:
> > Hi Eric,
> >
> > On Thu, Nov 9, 2023 at 3:16 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Nov 07, 2023 at 04:53:13PM +0800, Jerry Shih wrote:
> > > > On Nov 2, 2023, at 13:16, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > > > > On Thu, Oct 26, 2023 at 02:36:38AM +0800, Jerry Shih wrote:
> > > > >> +static int ecb_encrypt(struct skcipher_request *req)
> > > > >> +{
> > > > >> +  struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> > > > >> +  const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
> > > > >> +  struct skcipher_walk walk;
> > > > >> +  unsigned int nbytes;
> > > > >> +  int err;
> > > > >> +
> > > > >> +  /* If we have error here, the `nbytes` will be zero. */
> > > > >> +  err = skcipher_walk_virt(&walk, req, false);
> > > > >> +  while ((nbytes = walk.nbytes)) {
> > > > >> +          kernel_vector_begin();
> > > > >> +          rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr,
> > > > >> +                                   nbytes & AES_BLOCK_VALID_SIZE_MASK,
> > > > >> +                                   &ctx->key);
> > > > >> +          kernel_vector_end();
> > > > >> +          err = skcipher_walk_done(
> > > > >> +                  &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK);
> > > > >> +  }
> > > > >> +
> > > > >> +  return err;
> > > > >> +}
> > > > >
> > > > > There's no fallback for !crypto_simd_usable() here.  I really like it this way.
> > > > > However, for it to work (for skciphers and aeads), RISC-V needs to allow the
> > > > > vector registers to be used in softirq context.  Is that already the case?
> > > >
> > > > The kernel-mode-vector could be enabled in softirq, but we don't have nesting
> > > > vector contexts. Will we have the case that kernel needs to jump to softirq for
> > > > encryptions during the regular crypto function? If yes, we need to have fallbacks
> > > > for all algorithms.
> > >
> > > Are you asking what happens if a softirq is taken while the CPU is between
> > > kernel_vector_begin() and kernel_vector_end()?  I think that needs to be
> > > prevented by making kernel_vector_begin() and kernel_vector_end() disable and
> > > re-enable softirqs, like what kernel_neon_begin() and kernel_neon_end() do on
> > > arm64.  Refer to commit 13150149aa6ded which implemented that behavior on arm64.
> >
> > Yes, if making Vector available to softirq context is a must, then it
> > is reasonable to call local_bh_disable() in kernel_vector_begin().
> > However, softirq would not be the only user for Vector and disabling
> > it may cause extra latencies. Meanwhile, simply disabling bh in
> > kernel_vector_begin() will conflict with the patch[1] that takes an
> > approach to run Preemptible Vector. Though it is not clear yet on
> > whether we should run Vector without turning off preemption, I have
> > tested running preemptible Vector and observed some latency
> > improvements without sacrificing throughput. We will have a discussion
> > on LPC2023[2] and it'd be great if you could join or continue to
> > discuss it here.
> >
> > Approaches can be done such as nesting, if running Vector in softirq
> > is required. Since it requires extra save/restore on nesting, I think
> > we should run some tests to get more performance (latency/throughput)
> > figure let the result decide the final direction. For example, we
> > could run Vector in either nesting with preempt-V and  non-nesting
> > without preempt-V and compare the following performance catachristics:
> >  - System-wide latency impact
> >  - Latency and throughput of softirq-Vector itself
>
> The skcipher and aead APIs do indeed need to work in softirq context.
>
> It's possible to use a fallback, either by falling back to scalar instructions
> or by punting the encryption/decryption operation to a workqueue using
> crypto/simd.c.  However, both approaches have some significant disadvantages.
> It was nice that the need for them on arm64 was eliminated by commit
> 13150149aa6ded.  Note that it's possible to yield the vector unit occasionally,
> to keep preemption and softirqs from being disabled for too long.
>

It is also quite feasible to start out with an implementation of
kernel_vector_begin() that preserves all vector registers eagerly in a
special per-CPU allocation if the call is made in softirq context (and
BUG when called in hardirq/NMI context). This was my initial approach
on arm64 too.

Assuming that RiSC-V systems with vector units are not flooding the
market just yet, this gives you some time to study the issue without
the need to implement non-vector fallback crypto algorithms
everywhere.