Re: [PATCH v3 19/29] crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation

Andy Polyakov <appro@xxxxxxxxxxxxxx> · Fri, 11 Oct 2019 20:49:08 +0200

Hi,

On 10/11/2019 7:21 PM, René van Dorst wrote:
>
> ...
>
> I also wonder if we can also replace the "li $x, -4" and "and $x" with
> "sll $x"
> combination on other places like [0], also on line 1169?
>
> Replace this on line 1169, works on my device.
>
> -       li      $in0,-4
>         srl     $ctx,$tmp4,2
> -       and     $in0,$in0,$tmp4
>         andi    $tmp4,$tmp4,3
> +       sll     $in0, $ctx, 2
>         addu    $ctx,$ctx,$in0

The reason for why I chose to keep 'li $in0,-4' in poly1305_emit is
because the original sequence has higher instruction-level parallelism.
Yes, it's one extra instruction, but if all of them get paired, they
will execute faster. Yes, it doesn't help single-issue processors such
as yours, but thing is that next instruction depends on last, and then
*formally* it's more appropriate to aim for higher ILP as general rule.
Just in case, in poly1305_blocks is different, because dependent
instruction does not immediately follow one that computes the residue.

>> As for multiply-by-1-n-add.
>>
>
> I wonder how many devices do exist with the "poor man" version.

Well, it's not just how many devices, but more specifically how many of
those will end up running the code in question. I would guess poor-man's
unit would be found in ultra-low-power microcontroller, so... As
implied, it's probably sufficient to keep this in mind just in case :-)

Cheers.