Re: [PATCH 1/4] crypto/arm64: ghash - reduce performance impact of NEON yield checks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25 July 2018 at 11:48, Dave Martin <Dave.Martin@xxxxxxx> wrote:
> On Wed, Jul 25, 2018 at 10:11:42AM +0100, Ard Biesheuvel wrote:
>> On 25 July 2018 at 11:05, Dave Martin <Dave.Martin@xxxxxxx> wrote:
>> > On Tue, Jul 24, 2018 at 06:12:21PM +0100, Ard Biesheuvel wrote:
>> >> As reported by Vakul, checking the TIF_NEED_RESCHED flag after every
>> >> iteration of the GHASH and AES-GCM core routines is having a considerable
>> >> performance impact on cores such as the Cortex-A53 with Crypto Extensions
>> >> implemented.
>> >>
>> >> GHASH performance is down by 22% for large block sizes, and AES-GCM is
>> >> down by 16% for large block sizes and 128 bit keys. This appears to be
>> >> a result of the high performance of the crypto instructions on the one
>> >> hand (2.0 cycles per byte for GHASH, 3.0 cpb for AES-GCM), combined with
>> >> the relatively poor load/store performance of this simple core.
>> >>
>> >> So let's reduce this performance impact by only doing the yield check
>> >> once every 32 blocks for GHASH (or 4 when using the version based on
>> >> 8-bit polynomial multiplication), and once every 16 blocks for AES-GCM.
>> >> This way, we recover most of the performance while still limiting the
>> >> duration of scheduling blackouts due to disabling preemption to ~1000
>> >> cycles.
>> >>
>> >> Cc: Vakul Garg <vakul.garg@xxxxxxx>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
>> >> ---
>> >>  arch/arm64/crypto/ghash-ce-core.S | 12 +++++++++---
>> >>  1 file changed, 9 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
>> >> index dcffb9e77589..9c14beaabeee 100644
>> >> --- a/arch/arm64/crypto/ghash-ce-core.S
>> >> +++ b/arch/arm64/crypto/ghash-ce-core.S
>> >> @@ -212,7 +212,7 @@
>> >>       ushr            XL.2d, XL.2d, #1
>> >>       .endm
>> >>
>> >> -     .macro          __pmull_ghash, pn
>> >> +     .macro          __pmull_ghash, pn, yield_count
>> >>       frame_push      5
>> >>
>> >>       mov             x19, x0
>> >> @@ -259,6 +259,9 @@ CPU_LE(   rev64           T1.16b, T1.16b  )
>> >>       eor             T2.16b, T2.16b, XH.16b
>> >>       eor             XL.16b, XL.16b, T2.16b
>> >>
>> >> +     tst             w19, #(\yield_count - 1)
>> >
>> > This should perhaps be (\yield_count) - 1.
>> >
>> > It would be a bit silly to pass a non-trivial expression for yield_count
>> > though.
>> >
>> >> +     b.ne            1b
>> >> +
>> >
>> > Is it worth having a build-time check that \yield_count is a power of two?
>> > (i.e., (\yield_count) & ((\yield_count) - 1) != 0).  We could have a
>> > generic macro for that.
>> >
>> > Otherwise this code may poll more often than expected.  Not the end of
>> > the world, though.
>> >
>>
>> Thanks for taking a look.
>>
>> Given that the macro in question is used in exactly two places in the
>> same file, and is unlikely to be reused unless the architecture gains
>> support for another optional instruction that can be used as a drop-in
>> replacement, I don't think any of this truly matters tbh.
>
> Fair enough.  A comment on the macro definition might help, but beyond
> that it is probably overkill.
>

Sure. I'll add that in v2.



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux