Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25 July 2018 at 11:05, Dave Martin <Dave.Martin@xxxxxxx> wrote:
> On Tue, Jul 24, 2018 at 06:12:21PM +0100, Ard Biesheuvel wrote:
>> As reported by Vakul, checking the TIF_NEED_RESCHED flag after every
>> iteration of the GHASH and AES-GCM core routines is having a considerable
>> performance impact on cores such as the Cortex-A53 with Crypto Extensions
>> implemented.
>>
>> GHASH performance is down by 22% for large block sizes, and AES-GCM is
>> down by 16% for large block sizes and 128 bit keys. This appears to be
>> a result of the high performance of the crypto instructions on the one
>> hand (2.0 cycles per byte for GHASH, 3.0 cpb for AES-GCM), combined with
>> the relatively poor load/store performance of this simple core.
>>
>> So let's reduce this performance impact by only doing the yield check
>> once every 32 blocks for GHASH (or 4 when using the version based on
>> 8-bit polynomial multiplication), and once every 16 blocks for AES-GCM.
>> This way, we recover most of the performance while still limiting the
>> duration of scheduling blackouts due to disabling preemption to ~1000
>> cycles.
>>
>> Cc: Vakul Garg <vakul.garg@xxxxxxx>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
>> ---
>>  arch/arm64/crypto/ghash-ce-core.S | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
>> index dcffb9e77589..9c14beaabeee 100644
>> --- a/arch/arm64/crypto/ghash-ce-core.S
>> +++ b/arch/arm64/crypto/ghash-ce-core.S
>> @@ -212,7 +212,7 @@
>>       ushr            XL.2d, XL.2d, #1
>>       .endm
>>
>> -     .macro          __pmull_ghash, pn
>> +     .macro          __pmull_ghash, pn, yield_count
>>       frame_push      5
>>
>>       mov             x19, x0
>> @@ -259,6 +259,9 @@ CPU_LE(   rev64           T1.16b, T1.16b  )
>>       eor             T2.16b, T2.16b, XH.16b
>>       eor             XL.16b, XL.16b, T2.16b
>>
>> +     tst             w19, #(\yield_count - 1)
>
> This should perhaps be (\yield_count) - 1.
>
> It would be a bit silly to pass a non-trivial expression for yield_count
> though.
>
>> +     b.ne            1b
>> +
>
> Is it worth having a build-time check that \yield_count is a power of two?
> (i.e., (\yield_count) & ((\yield_count) - 1) != 0).  We could have a
> generic macro for that.
>
> Otherwise this code may poll more often than expected.  Not the end of
> the world, though.
>

Thanks for taking a look.

Given that the macro in question is used in exactly two places in the
same file, and is unlikely to be reused unless the architecture gains
support for another optional instruction that can be used as a drop-in
replacement, I don't think any of this truly matters tbh.



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux