This is an offshoot of the previous patch series at: https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@xxxxxxx Add a kernel_fpu_yield() function for x86 crypto drivers to call periodically during long loops. Test results ============ I created 28 tcrypt modules so modprobe can run concurrent tests, added 1 MiB functional and speed tests to tcrypt, and ran three processes spawning 28 subprocesses (one per physical CPU core) each looping forever through all the tcrypt test modes. This keeps the system quite busy, generating RCU stalls and soft lockups during both generic and x86 crypto function processing. In conjunction with these patch series: * [PATCH 0/8] crypto: kernel-doc for assembly language https://lore.kernel.org/linux-crypto/20221219185555.433233-1-elliott@xxxxxxx * [PATCH 0/3] crypto/rcu: suppress unnecessary CPU stall warnings https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@xxxxxxx * [PATCH 0/3] crypto: yield at end of operations https://lore.kernel.org/linux-crypto/20221219203733.3063192-1-elliott@xxxxxxx while using the default RCU values (60 s stalls, 21 s expedited stalls), several nights of testing did not result in any RCU stall warnings or soft lockups in any of these preemption modes: preempt=none preempt=voluntary preempt=full Setting the shortest possible RCU timeouts (3 s, 20 ms) did still result in RCU stalls, but only about one every 2 hours, and not occurring on particular modules like sha512_ssse3 and sm4-generic. systemd usually crashes and restarts when its journal becomes full from all the tcrypt printk messages. Without the patches, that triggered more RCU stall reports and soft lockups; with the patches, only userspace seems perturbed. Robert Elliott (13): x86: protect simd.h header file x86: add yield FPU context utility function crypto: x86/sha - yield FPU context during long loops crypto: x86/crc - yield FPU context during long loops crypto: x86/sm3 - yield FPU context during long loops crypto: x86/ghash - use u8 rather than char crypto: x86/ghash - restructure FPU context saving crypto: x86/ghash - yield FPU context during long loops crypto: x86/poly - yield FPU context only when needed crypto: x86/aegis - yield FPU context during long loops crypto: x86/blake - yield FPU context only when needed crypto: x86/chacha - yield FPU context only when needed crypto: x86/aria - yield FPU context only when needed arch/x86/crypto/aegis128-aesni-glue.c | 49 ++++++--- arch/x86/crypto/aria_aesni_avx_glue.c | 7 +- arch/x86/crypto/blake2s-glue.c | 41 +++---- arch/x86/crypto/chacha_glue.c | 22 ++-- arch/x86/crypto/crc32-pclmul_glue.c | 49 +++++---- arch/x86/crypto/crc32c-intel_glue.c | 118 ++++++++++++++------ arch/x86/crypto/crct10dif-pclmul_glue.c | 65 ++++++++--- arch/x86/crypto/ghash-clmulni-intel_asm.S | 6 +- arch/x86/crypto/ghash-clmulni-intel_glue.c | 37 +++++-- arch/x86/crypto/nhpoly1305-avx2-glue.c | 22 ++-- arch/x86/crypto/nhpoly1305-sse2-glue.c | 22 ++-- arch/x86/crypto/poly1305_glue.c | 47 ++++---- arch/x86/crypto/polyval-clmulni_glue.c | 46 +++++--- arch/x86/crypto/sha1_avx2_x86_64_asm.S | 6 +- arch/x86/crypto/sha1_ni_asm.S | 8 +- arch/x86/crypto/sha1_ssse3_glue.c | 120 +++++++++++++++++---- arch/x86/crypto/sha256_ni_asm.S | 8 +- arch/x86/crypto/sha256_ssse3_glue.c | 115 ++++++++++++++++---- arch/x86/crypto/sha512_ssse3_glue.c | 89 ++++++++++++--- arch/x86/crypto/sm3_avx_glue.c | 34 +++++- arch/x86/include/asm/simd.h | 23 ++++ include/crypto/internal/blake2s.h | 8 +- lib/crypto/blake2s-generic.c | 12 +-- 23 files changed, 687 insertions(+), 267 deletions(-) -- 2.38.1