Re: [PATCH] x86, crypto: ported aes-ni implementation to x86

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30.10.2010, 00:15 Herbert Xu wrote:
> Mathias Krause <minipli@xxxxxxxxxxxxxx> wrote:
>> The AES-NI instructions are also available in legacy mode so the x86
>> architecture may profit from those, too.
>> 
>> To illustrate the performance gain here's a short summary of the tcrypt
>> speed test on a Core i5 M 520 running at 2.40GHz comparing both
>> assembler implementations:
>> 
>>                            aes-i586   aes-ni-i586   delta
>> 256 bit, 8kB blocks, ECB:  46.81 MB/s   164.46 MB/s   +251%
>> 256 bit, 8kB blocks, CBC:  43.89 MB/s    62.18 MB/s    +41%
>> 384 bit, 8kB blocks, LRW:  42.24 MB/s   142.90 MB/s   +238%
>> 512 bit, 8kB blocks, XTS:  43.41 MB/s   148.67 MB/s   +242%
>> 
>> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx>
> 
> Nice work :)
> 
> I have to say though that I'll love this een more if we could
> avoid duplicating those assembly files somehow.  Is this possible?

I thought about that too but found it more easy to split those files.
The different calling conventions of the architectures and the limited 
register set on the 32-bit version made me make some not so nice 
#ifdef-able changes to the code so it'll work with less registers.

> Oh and those CBC numbers look out of whack.  I'd expect CBC to be
> way faster as it's done directly by the hardware unlike the
> other modes.

Well, actually the 32-bit assembler implementation has specialized 
algorithms for ECB and CBC. But the latter must be implemented a 
little different than the 64-bit version because I didn't have enough 
xmm registers to make a 1:1 port. So I reused some registers for 
loading memory values and used direct memory references to make 
aesni_cbc_dec() work with the limited amount of registers.

I'll look into it if we can do better, but if not, maybe leaving this 
one out for the 32-bit version might be the best option. Doing so may
even make it easier to combine the two assembler files again.

Btw., because of the limited register set I wasn't able to port the 
CTR mode version, yet. It uses even more registers -- xmm and general 
purpose. :(

>  What numbers do you get in 64-bit before/after
> your patch?

Haven't yet build a 64-bit kernel but will try that tomorrow.

> If the hardware CBC is really so much slower then maybe we should
> stop using it.

This must be related to the changes I made to the code. I would guess 
it doesn't like the additional memory loads.

There's even more potential for optimization since I've still a 
general purpose register left. ;)

See this version as a first version to get feedback, especially from 
Huang Ying. But it's already quite fast. :)


Regards,
Mathias

> 
> Thanks,
> -- 
> Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux