Re: [PATCH] crypto: x86/aes-gcm: Disable FPU around skcipher_walk_done().

Eric Biggers <ebiggers@xxxxxxxxxx> · Mon, 5 Aug 2024 10:38:26 -0700

On Mon, Aug 05, 2024 at 10:41:21AM +0200, Sebastian Andrzej Siewior wrote:
> On 2024-08-02 09:28:32 [-0700], Eric Biggers wrote:
> > Hi Sebastian,
> Hi Eric,
> 
> > > diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
> > > index cd37de5ec4046..be92e4c3f9c7f 100644
> > > --- a/arch/x86/crypto/aesni-intel_glue.c
> > > +++ b/arch/x86/crypto/aesni-intel_glue.c
> > > @@ -1403,7 +1403,9 @@ gcm_crypt(struct aead_request *req, int flags)
> > >  			aes_gcm_update(key, le_ctr, ghash_acc,
> > >  				       walk.src.virt.addr, walk.dst.virt.addr,
> > >  				       nbytes, flags);
> > > +			kernel_fpu_end();
> > >  			err = skcipher_walk_done(&walk, 0);
> > > +			kernel_fpu_begin();
> > >  			/*
> > >  			 * The low word of the counter isn't used by the
> > >  			 * finalize, so there's no need to increment it here.
> > 
> > Can you make this conditional on CONFIG_PREEMPT_RT so that it doesn't hurt
> > performance for everyone else?
> 
> Every other instance in this file had a kernel_fpu_end/ begin() before
> skcipher_walk_done() so I though was just missed by chance.

No, it was intentional.  See the comment above the first kernel_fpu_begin() in
gcm_crypt():

	/*
	 * Since the AES-GCM assembly code requires that at least three assembly
	 * functions be called to process any message (this is needed to support
	 * incremental updates cleanly), to reduce overhead we try to do all
	 * three calls in the same kernel FPU section if possible.  We close the
	 * section and start a new one if there are multiple data segments or if
	 * rescheduling is needed while processing the associated data.
	 */
	kernel_fpu_begin();

> > Note that kfree() lacks a might_sleep(), and its kerneldoc does not say that it
> > can sleep.  Have you checked for other instances of this same problem?  It seems
> > it would be quite common kernel-wide.  
> 
> kfree() can't have a might_sleep() because it does not qualify for this
> since you can use it in softirq context for instance with an acquired
> spinlockt_t on !RT which would trigger it.
> On PREEMPT_RT interrupts are threaded, softirq is preemptible,
> spintlock_t is a sleeping lock so all these things where a kfree()
> would have been invoked in preempt-disable context on !PREEMPT_RT is
> actually preemptible on PREEMPT_RT.
> This is of course not true in cases where preemption is explicitly
> disabled like in this case.

WARN_ON(!preemptible()) then?

If I add that to kfree(), it triggers from lots of other places.  Are those
problems on PREEMPT_RT too?

What I am trying to get at is what debugging options do I need to detect issues
like this.  Is there really no option other than actually running a PREEMPT_RT
kernel?

I had tested this code with lots of debug options pre-merge and nothing came up.

If there was something in CONFIG_SLUB_DEBUG, for example, I would have seen
that, and you would never have had to deal with this issue at all as it would
never have been introduced.

- Eric