Re: [PATCH] crypto: x86: Do not acquire fpu context for too long

Taehee Yoo <ap420073@xxxxxxxxx> · Wed, 12 Oct 2022 15:08:24 +0900

Hi Elliott, Robert

2022. 10. 10. 오전 4:58에 Elliott, Robert (Servers) 이(가) 쓴 글:
>
>
>> -----Original Message-----
>> From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
>> Sent: Sunday, October 9, 2022 1:20 AM
>> To: Elliott, Robert (Servers) <elliott@xxxxxxx>
>> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>; Taehee Yoo 
<ap420073@xxxxxxxxx>; linux-
>> crypto@xxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx;
>> mingo@xxxxxxxxxx; bp@xxxxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; 
x86@xxxxxxxxxx;
>> hpa@xxxxxxxxx; ebiggers@xxxxxxxxxx
>> Subject: Re: [PATCH] crypto: x86: Do not acquire fpu context for too 
long
>>
>> On Sat, Oct 08, 2022 at 07:48:07PM +0000, Elliott, Robert (Servers) 
wrote:
>>>
>>> Perhaps the cycles mode needs to call cond_resched() too?
>>
>> Yes, just make the cond_resched unconditional.  Having a few too many
>> rescheds shouldn't be an issue.
>
> This looks promising. I was able to trigger a lot of rcu stalls by 
setting:
>    echo 2 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>    echo 200 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>
> and running these concurrently:
>    watch -n 0 modprobe tcrypt=200
>    watch -n 0 module tcrypt=0 through 999
>
> After changing tcrypt to call cond_resched in both cases, I don't see any
> more rcu stalls.
>
> I am getting miscompares from the extended self-test for crc32 and
> crct10dif, and will investigate those further.
>
> BTW, the way tcrypt always refuses to load leads to an ever-growing 
list in
> the Call Traces:
>
> kernel: Unloaded tainted modules: tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 t
> crypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1
> tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1
>   tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():
> 1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 
tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt()
> :1 tcrypt():1 tcrypt():1 tcrypt():1
>
>
>

I tested mb_aead as well.

I can find rcu stalls easily while testing gcm(aes-generic) with the 
below commands.
#shell1
while :
do
    modprobe tcrypt mode=215 num_mb=1024
done
#shell2
while :
do
    modprobe tcrypt mode=0
done

Then, I added cond_resched() as you mentioned.

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index a82679b576bb..eeb3abb4eece 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -415,6 +415,7 @@ static void test_mb_aead_speed(const char *algo, int 
enc, int secs,
                        } else {
                                ret = test_mb_aead_cycles(data, enc, bs,
                                                          num_mb);
+                               cond_resched();
                        }

I can't see rcu stalls anymore, I think it works well.