Re: [PATCH] crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 31, 2018 at 06:51:34PM +0200, Ard Biesheuvel wrote:
> >>
> >> +       adr             ip, .Lrol8_table
> >>         mov             r3, #10
> >>
> >>  .Ldoubleround4:
> >> @@ -238,24 +268,25 @@ ENTRY(chacha20_4block_xor_neon)
> >>         // x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> >>         // x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> >>         // x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> >> +       vld1.8          {d16}, [ip, :64]
> 
> Also, would it perhaps be more efficient to keep the rotation vector
> in a pair of GPRs, and use something like
> 
> vmov d16, r4, r5
> 
> here?
> 

I tried that, but it doesn't help on either Cortex-A7 or Cortex-A53.
In fact it's very slightly worse.

- Eric



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux