On 25 July 2018 at 11:48, Dave Martin <Dave.Martin@xxxxxxx> wrote: > On Wed, Jul 25, 2018 at 10:11:42AM +0100, Ard Biesheuvel wrote: >> On 25 July 2018 at 11:05, Dave Martin <Dave.Martin@xxxxxxx> wrote: >> > On Tue, Jul 24, 2018 at 06:12:21PM +0100, Ard Biesheuvel wrote: >> >> As reported by Vakul, checking the TIF_NEED_RESCHED flag after every >> >> iteration of the GHASH and AES-GCM core routines is having a considerable >> >> performance impact on cores such as the Cortex-A53 with Crypto Extensions >> >> implemented. >> >> >> >> GHASH performance is down by 22% for large block sizes, and AES-GCM is >> >> down by 16% for large block sizes and 128 bit keys. This appears to be >> >> a result of the high performance of the crypto instructions on the one >> >> hand (2.0 cycles per byte for GHASH, 3.0 cpb for AES-GCM), combined with >> >> the relatively poor load/store performance of this simple core. >> >> >> >> So let's reduce this performance impact by only doing the yield check >> >> once every 32 blocks for GHASH (or 4 when using the version based on >> >> 8-bit polynomial multiplication), and once every 16 blocks for AES-GCM. >> >> This way, we recover most of the performance while still limiting the >> >> duration of scheduling blackouts due to disabling preemption to ~1000 >> >> cycles. >> >> >> >> Cc: Vakul Garg <vakul.garg@xxxxxxx> >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> >> >> --- >> >> arch/arm64/crypto/ghash-ce-core.S | 12 +++++++++--- >> >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> >> >> diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S >> >> index dcffb9e77589..9c14beaabeee 100644 >> >> --- a/arch/arm64/crypto/ghash-ce-core.S >> >> +++ b/arch/arm64/crypto/ghash-ce-core.S >> >> @@ -212,7 +212,7 @@ >> >> ushr XL.2d, XL.2d, #1 >> >> .endm >> >> >> >> - .macro __pmull_ghash, pn >> >> + .macro __pmull_ghash, pn, yield_count >> >> frame_push 5 >> >> >> >> mov x19, x0 >> >> @@ -259,6 +259,9 @@ CPU_LE( rev64 T1.16b, T1.16b ) >> >> eor T2.16b, T2.16b, XH.16b >> >> eor XL.16b, XL.16b, T2.16b >> >> >> >> + tst w19, #(\yield_count - 1) >> > >> > This should perhaps be (\yield_count) - 1. >> > >> > It would be a bit silly to pass a non-trivial expression for yield_count >> > though. >> > >> >> + b.ne 1b >> >> + >> > >> > Is it worth having a build-time check that \yield_count is a power of two? >> > (i.e., (\yield_count) & ((\yield_count) - 1) != 0). We could have a >> > generic macro for that. >> > >> > Otherwise this code may poll more often than expected. Not the end of >> > the world, though. >> > >> >> Thanks for taking a look. >> >> Given that the macro in question is used in exactly two places in the >> same file, and is unlikely to be reused unless the architecture gains >> support for another optional instruction that can be used as a drop-in >> replacement, I don't think any of this truly matters tbh. > > Fair enough. A comment on the macro definition might help, but beyond > that it is probably overkill. > Sure. I'll add that in v2.