Re: [PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 17, 2018 at 01:10:41PM +0200, Ard Biesheuvel wrote:
> >>>>> +
> >>>>> +     // One-time XTS preparation
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Allocate stack space to store 128 bytes worth of tweaks.  For
> >>>>> +      * performance, this space is aligned to a 16-byte boundary so that we
> >>>>> +      * can use the load/store instructions that declare 16-byte alignment.
> >>>>> +      */
> >>>>> +     sub             sp, #128
> >>>>> +     bic             sp, #0xf
> >>>>
> >>>>
> >>>> This fails here when building with CONFIG_THUMB2_KERNEL=y
> >>>>
> >>>>   AS      arch/arm/crypto/speck-neon-core.o
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S: Assembler messages:
> >>>>
> >>>> arch/arm/crypto/speck-neon-core.S:419: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:423: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:427: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>> arch/arm/crypto/speck-neon-core.S:431: Error: r13 not allowed here --
> >>>> `bic sp,#0xf'
> >>>>
> >>>> In a quick hack this change seems to address it:
> >>>>
> >>>>
> >>>> -       sub             sp, #128
> >>>> -       bic             sp, #0xf
> >>>> +       mov             r6, sp
> >>>> +       sub             r6, #128
> >>>> +       bic             r6, #0xf
> >>>> +       mov             sp, r6
> >>>>
> >>>> But there is probably a better solution to address this.
> >>>>
> >>>
> >>> Given that there is no NEON on M class cores, I recommend we put something like
> >>>
> >>> THUMB(bx pc)
> >>> THUMB(nop.w)
> >>> THUMB(.arm)
> >>>
> >>> at the beginning and be done with it.
> >>
> >> I mean nop.n or just nop, of course, and we may need a '.align 2' at
> >> the beginning as well.
> >
> > Wouldn't it be preferable to have it assemble it in Thumb2 too? It seems
> > that bic sp,#0xf is the only issue...
> >
> 
> Well, in general, yes. In the case of NEON code, not really, since the
> resulting code will not be smaller anyway, because the Thumb2 NEON
> opcodes are all 4 bytes. Also, Thumb2-only cores don't have NEON
> units, so all cores that this code can run on will be able to run in
> ARM mode.
> 
> So from a maintainability pov, having code that only assembles in one
> way is better than having code that must compile both to ARM and to
> Thumb2 opcodes.
> 
> Just my 2 cents, anyway.

I don't have too much of a preference, though Stefan's suggested 4 instructions
can be reduced to 3, which also matches what aes-neonbs-core.S does:

        sub             r12, sp, #128
        bic             r12, #0xf
        mov             sp, r12

Ard, is the following what you're suggesting instead?

diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
index 3c1e203e53b9..c989ce3dc057 100644
--- a/arch/arm/crypto/speck-neon-core.S
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -8,6 +8,7 @@
  */
 
 #include <linux/linkage.h>
+#include <asm/assembler.h>
 
 	.text
 	.fpu		neon
@@ -233,6 +234,12 @@
  * nonzero multiple of 128.
  */
 .macro _speck_xts_crypt	n, decrypting
+
+	.align		2
+	THUMB(bx pc)
+	THUMB(nop)
+	THUMB(.arm)
+
 	push		{r4-r7}
 	mov		r7, sp
 
@@ -413,6 +420,8 @@
 	mov		sp, r7
 	pop		{r4-r7}
 	bx		lr
+
+	THUMB(.thumb)
 .endm
 
 ENTRY(speck128_xts_encrypt_neon)



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux