Only perform the NEON yield check for every 8 blocks of input, to prevent taking a considerable performance hit on cores with very fast crypto instructions and comparatively slow memory accesses, such as the Cortex-A53. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> --- arch/arm64/crypto/aes-ce-ccm-core.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 88f5aef7934c..627710cdc220 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -208,6 +208,9 @@ CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */ st1 {v1.16b}, [x19], #16 /* write output block */ beq 5f + tst w21, #(0x7 * 16) /* yield every 8 blocks */ + b.ne 0b + if_will_cond_yield_neon st1 {v0.16b}, [x24] /* store mac */ do_cond_yield_neon -- 2.11.0