Re: [PATCH v2 05/20] crypto: mips/chacha - import accelerated 32r2 code from Zinc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 4 Oct 2019 at 17:15, René van Dorst <opensource@xxxxxxxxxx> wrote:
>
> Hi Jason,
>
> Quoting "Jason A. Donenfeld" <Jason@xxxxxxxxx>:
>
> > On Fri, Oct 4, 2019 at 4:44 PM Ard Biesheuvel
> > <ard.biesheuvel@xxxxxxxxxx> wrote:
> >> The round count is passed via the fifth function parameter, so it is
> >> already on the stack. Reloading it for every block doesn't sound like
> >> a huge deal to me.
> >
> > Please benchmark it to indicate that, if it really isn't a big deal. I
> > recall finding that memory accesses on common mips32r2 commodity
> > router hardware was extremely inefficient. The whole thing is designed
> > to minimize memory accesses, which are the primary bottleneck on that
> > platform.
>
> I also think it isn't a big deal, but I shall benchmark it this weekend.
> If I am correct a memory write will first put in cache. So if you read
> it again and it is in cache it is very fast. 1 or 2 clockcycles.
> Also the value isn't used directly after it is read.
> So cpu don't have to stall on this read.
>

Thanks René.

Note that the round count is not being spilled. I [re]load it from the
stack as a function parameter.

So instead of

li $at, 20

I do

lw $at, 16($sp)


Thanks a lot for taking the time to double check this. I think it
would be nice to be able to expose xchacha12 like we do on other
architectures.

Note that for xchacha, I also added a hchacha_block() routine based on
your code (with the round count as the third argument) [0]. Please let
me know if you see anything wrong with that.


+.globl hchacha_block
+.ent hchacha_block
+hchacha_block:
+ .frame $sp, STACK_SIZE, $ra
+
+ addiu $sp, -STACK_SIZE
+
+ /* Save s0-s7 */
+ sw $s0, 0($sp)
+ sw $s1, 4($sp)
+ sw $s2, 8($sp)
+ sw $s3, 12($sp)
+ sw $s4, 16($sp)
+ sw $s5, 20($sp)
+ sw $s6, 24($sp)
+ sw $s7, 28($sp)
+
+ lw X0, 0(STATE)
+ lw X1, 4(STATE)
+ lw X2, 8(STATE)
+ lw X3, 12(STATE)
+ lw X4, 16(STATE)
+ lw X5, 20(STATE)
+ lw X6, 24(STATE)
+ lw X7, 28(STATE)
+ lw X8, 32(STATE)
+ lw X9, 36(STATE)
+ lw X10, 40(STATE)
+ lw X11, 44(STATE)
+ lw X12, 48(STATE)
+ lw X13, 52(STATE)
+ lw X14, 56(STATE)
+ lw X15, 60(STATE)
+
+.Loop_hchacha_xor_rounds:
+ addiu $a2, -2
+ AXR( 0, 1, 2, 3, 4, 5, 6, 7, 12,13,14,15, 16);
+ AXR( 8, 9,10,11, 12,13,14,15, 4, 5, 6, 7, 12);
+ AXR( 0, 1, 2, 3, 4, 5, 6, 7, 12,13,14,15, 8);
+ AXR( 8, 9,10,11, 12,13,14,15, 4, 5, 6, 7, 7);
+ AXR( 0, 1, 2, 3, 5, 6, 7, 4, 15,12,13,14, 16);
+ AXR(10,11, 8, 9, 15,12,13,14, 5, 6, 7, 4, 12);
+ AXR( 0, 1, 2, 3, 5, 6, 7, 4, 15,12,13,14, 8);
+ AXR(10,11, 8, 9, 15,12,13,14, 5, 6, 7, 4, 7);
+ bnez $a2, .Loop_hchacha_xor_rounds
+
+ sw X0, 0(OUT)
+ sw X1, 4(OUT)
+ sw X2, 8(OUT)
+ sw X3, 12(OUT)
+ sw X12, 16(OUT)
+ sw X13, 20(OUT)
+ sw X14, 24(OUT)
+ sw X15, 28(OUT)
+
+ /* Restore used registers */
+ lw $s0, 0($sp)
+ lw $s1, 4($sp)
+ lw $s2, 8($sp)
+ lw $s3, 12($sp)
+ lw $s4, 16($sp)
+ lw $s5, 20($sp)
+ lw $s6, 24($sp)
+ lw $s7, 28($sp)
+
+ addiu $sp, STACK_SIZE
+ jr $ra
+.end hchacha_block
+.set at


[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=wireguard-crypto-library-api-v3&id=cc74a037f8152d52bd17feaf8d9142b61761484f




[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux