Vakul reports a considerable performance hit when running the accelerated arm64 crypto routines with CONFIG_PREEMPT=y configured, now that thay have been updated to take the TIF_NEED_RESCHED flag into account. The issue appears to be caused by the fact that Cortex-A53, the core in question, has a high end implementation of the Crypto Extensions, and has a shallow pipeline, which means even sequential algorithms that may be held back by pipeline stalls on high end out of order cores run at maximum speed. This means SHA-1, SHA-2, GHASH and AES in GCM and CCM modes run at a speed in the order of 2 to 4 cycles per byte, and are currently implemented to check the TIF_NEED_RESCHED after each iteration, which may process as little as 16 bytes (for GHASH). Obviously, every cycle of overhead hurts in this context, and given that the A53's load/store unit is not quite high end, any delays caused by memory accesses that occur in the inner loop of the algorithms are going to be quite significant, hence the performance regression. So reduce the frequency at which the NEON yield checks are performed, so that they occur roughly once every 1000 cycles, which is hopefully a reasonable tradeoff between throughput and worst case scheduling latency. Ard Biesheuvel (4): crypto/arm64: ghash - reduce performance impact of NEON yield checks crypto/arm64: aes-ccm - reduce performance impact of NEON yield checks crypto/arm64: sha1 - reduce performance impact of NEON yield checks crypto/arm64: sha2 - reduce performance impact of NEON yield checks arch/arm64/crypto/aes-ce-ccm-core.S | 3 +++ arch/arm64/crypto/ghash-ce-core.S | 12 +++++++++--- arch/arm64/crypto/sha1-ce-core.S | 3 +++ arch/arm64/crypto/sha2-ce-core.S | 3 +++ 4 files changed, 18 insertions(+), 3 deletions(-) -- 2.11.0