> > From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> ... > > On Sat, Oct 08, 2022 at 07:48:07PM +0000, Elliott, Robert (Servers) wrote: > > > > > > Perhaps the cycles mode needs to call cond_resched() too? > > > > Yes, just make the cond_resched unconditional. Having a few too many > > rescheds shouldn't be an issue. > > This looks promising. I was able to trigger a lot of rcu stalls by setting: > echo 2 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout > echo 200 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout > > and running these concurrently: > watch -n 0 modprobe tcrypt=200 > watch -n 0 module tcrypt=0 through 999 > > I am getting miscompares from the extended self-test for crc32 and > crct10dif, and will investigate those further. The assembly functions for those two do not handle small sizes (unlike crc32c, which handles all sizes), so the new do/while loop needs to use larger steps and call the generic function for any leftover bytes. I've corrected that patch. > After changing tcrypt to call cond_resched in both cases, I don't see any > more rcu stalls. This ran cleanly with a few nights of testing with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y which compares the accelerated implementation to the generic implementation under randomly generated test vectors - a very helpful test suite. It only works with data sizes up to 2 * PAGE_SIZE, though. With the cond_resched changes, rcu stalls are gone with one exception. If printing of Call Traces is triggered for any reason, either by a test failing (e.g., because of the CRC loop bugs) or by using a SysRq trigger like: echo l > /proc/sysrq-trigger then the very act of generating and printing the Call Traces can trigger rcu stall messages. Examples: Oct 09 14:56:39 kernel: cryptomgr: alg: shash: crct10dif-pclmul test failed (wrong result) on test vector "random: psize=4303 ksize=0", cfg="random: use_final src_divs=[<flush>95.43%@+4, <flush>4.57%@+6]" Oct 09 14:56:39 kernel: cryptomgr: alg: self-tests for crct10dif using crct10dif failed (rc=-22) Oct 09 14:56:39 kernel: ------------[ cut here ]------------ Oct 09 14:56:39 kernel: alg: self-tests for crct10dif using crct10dif failed (rc=-22) Oct 09 14:56:39 kernel: WARNING: CPU: 55 PID: 10553 at crypto/testmgr.c:5837 alg_test.cold+0x3e/0x124 ... Oct 09 14:56:42 kernel: R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000040000 Oct 09 14:56:42 kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: and: Oct 10 19:45:02 kernel: sysrq: Show backtrace of all active CPUs ... Oct 10 19:45:03 kernel: </TASK> ... Oct 10 19:45:06 kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: If there are no WARNs or call traces, then there are no rcu stalls. So, we should be aware that commit 09a5ef9644bc0e1 ("crypto: testmgr - WARN on test failure") might introduce additional errors. > BTW, the way tcrypt always refuses to load leads to an ever-growing list in > the Call Traces: > > kernel: Unloaded tainted modules: tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 t > crypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt(): > 1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 tcrypt():1 > tcrypt():1 tcrypt():1 tcrypt():1 tcrypt() > :1 tcrypt():1 tcrypt():1 tcrypt():1 That was already noticed and a fix has already been posted for 6.1 as commit 47cc75aa9283 ("module: tracking: Keep a record of tainted unloaded modules only")