> I'll keep experimenting with all the preempt modes, heavier > workloads, and shorter RCU timeouts to confirm this solution > is robust. It might even be appropriate for the generic > drivers, if they suffer from the problems that sm4 shows here. I have a set of patches that's looking promising. It's no longer generating RCU stall warnings or soft lockups with either x86 drivers or generic drivers (sm4 is particularly taxing). Test case: * added 28 clones of the tcrypt module so modprobe can run it many times in parallel (1 thread per CPU core) * added 1 MiB big buffer functional tests (compare to generic results) * added 1 MiB big buffer speed tests * 3 windows running * 28 threads running * modprobe with each defined test mode in order 1, 2, 3, etc. * RCU stall timeouts set to shortest supported values * run in preempt=none, preempt=voluntary, preempt=full modes Patches include: * Ard's kmap_local() patch * Suppress RCU stall warnings during speed tests. Change the rcu_sysrq_start()/end() functions to be general purpose and call them from tcrypt test functions that measure time of a crypto operation * add crypto_yield() unilaterally in skcipher_walk_done so it is run even if data is aligned * add crypto_yield() in aead_encrypt/decrypt so they always call it like skcipher * add crypto_yield() at the end each hash update(), digest(), and finup() function so they always call it like skcipher * add kernel_fpu_yield() calls every 4 KiB inside x86 kernel_fpu_begin()/end() blocks, so the x86 functions always yield to the scheduler even when they're bypassing those helper functions (that now call crypto_yield() more consistently) I'll keep trying to break it over the weekend. If it holds up I'll post the patches next week.