On 8/20/20 12:56 AM, Ard Biesheuvel wrote:
On Thu, 20 Aug 2020 at 09:54, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
On Thu, Aug 20, 2020 at 09:48:02AM +0200, Ard Biesheuvel wrote:
Or are you saying on Ben's machine cbc-aesni would have worse
performance vs. aes-generic?
Yes, given the pathological overhead of FPU preserve/restore for every
block of 16 bytes processed by the cbcmac wrapper.
I'm sceptical. Do we have numbers showing this? You can get them
from tcrypt with my patch:
https://patchwork.kernel.org/patch/11701343/
Just do
modprobe tcrypt mode=400 alg='cmac(aes-aesni)' klen=16
modprobe tcrypt mode=400 alg='cmac(aes-generic)' klen=16
cmac() is not really relevant for performance, afaict. Only cbcmac()
is used for bulk data.
Sure but it's trivial to extend my cmac patch to support cbcmac.
Sure.
Ben, care to have a go at the above on your hardware? It would help us
get to the bottom of this issue.
Here's a run on an: Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz
testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[ 259.397756] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 244 cycles/operation, 15 cycles/byte
[ 259.397759] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 1052 cycles/operation, 16 cycles/byte
[ 259.397765] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 641 cycles/operation, 10 cycles/byte
[ 259.397768] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 3909 cycles/operation, 15 cycles/byte
[ 259.397786] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 2602 cycles/operation, 10 cycles/byte
[ 259.397797] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 2211 cycles/operation, 8 cycles/byte
[ 259.397807] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 15453 cycles/operation, 15 cycles/byte
[ 259.397872] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 8863 cycles/operation, 8 cycles/byte
[ 259.397910] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 8442 cycles/operation, 8 cycles/byte
[ 259.397946] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 43542 cycles/operation, 21 cycles/byte
[ 259.398110] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 17649 cycles/operation, 8 cycles/byte
[ 259.398184] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 21255 cycles/operation, 10 cycles/byte
[ 259.398267] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 16322 cycles/operation, 7 cycles/byte
[ 259.398335] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 60301 cycles/operation, 14 cycles/byte
[ 259.398585] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 34413 cycles/operation, 8 cycles/byte
[ 259.398728] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 32894 cycles/operation, 8 cycles/byte
[ 259.398865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 32521 cycles/operation, 7 cycles/byte
[ 259.399000] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 120415 cycles/operation, 14 cycles/byte
[ 259.399550] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 68635 cycles/operation, 8 cycles/byte
[ 259.399834] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 83770 cycles/operation, 10 cycles/byte
[ 259.400157] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 65075 cycles/operation, 7 cycles/byte
[ 259.400427] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 65085 cycles/operation, 7 cycles/byte
[ 294.171336]
testing speed of async cmac(aes-generic) (cmac(aes-generic))
[ 294.171340] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 275 cycles/operation, 17 cycles/byte
[ 294.171343] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 1191 cycles/operation, 18 cycles/byte
[ 294.171350] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 738 cycles/operation, 11 cycles/byte
[ 294.171354] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 4386 cycles/operation, 17 cycles/byte
[ 294.171374] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 2915 cycles/operation, 11 cycles/byte
[ 294.171387] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 2464 cycles/operation, 9 cycles/byte
[ 294.171398] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 17558 cycles/operation, 17 cycles/byte
[ 294.171472] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 14022 cycles/operation, 13 cycles/byte
[ 294.171530] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 9022 cycles/operation, 8 cycles/byte
[ 294.171569] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 38107 cycles/operation, 18 cycles/byte
[ 294.171722] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 18083 cycles/operation, 8 cycles/byte
[ 294.171798] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 17260 cycles/operation, 8 cycles/byte
[ 294.171870] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 17415 cycles/operation, 8 cycles/byte
[ 294.171943] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 66005 cycles/operation, 16 cycles/byte
[ 294.172217] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 36035 cycles/operation, 8 cycles/byte
[ 294.172366] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 42812 cycles/operation, 10 cycles/byte
[ 294.172533] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 53415 cycles/operation, 13 cycles/byte
[ 294.172745] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 133326 cycles/operation, 16 cycles/byte
[ 294.173297] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 90271 cycles/operation, 11 cycles/byte
[ 294.173646] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 68703 cycles/operation, 8 cycles/byte
[ 294.173931] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 67951 cycles/operation, 8 cycles/byte
[ 294.174213] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 68370 cycles/operation, 8 cycles/byte
On my slow apu2 board with processor: AMD GX-412TC SOC
testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[ 51.750514] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 600 cycles/operation, 37 cycle
[ 51.750532] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 2063 cycles/operation, 32 cycle
[ 51.750582] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 1326 cycles/operation, 20 cycle
[ 51.750619] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 11190 cycles/operation, 43 cycle
[ 51.750775] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 4935 cycles/operation, 19 cycle
[ 51.750840] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 8652 cycles/operation, 33 cycle
[ 51.750948] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 43430 cycles/operation, 42 cycle
[ 51.751488] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 23589 cycles/operation, 23 cycle
[ 51.751810] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 18759 cycles/operation, 18 cycle
[ 51.752027] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 79699 cycles/operation, 38 cycle
[ 51.753035] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 39900 cycles/operation, 19 cycle
[ 51.753559] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 38390 cycles/operation, 18 cycle
[ 51.754057] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 40888 cycles/operation, 19 cycle
[ 51.754615] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 143019 cycles/operation, 34 cycle
[ 51.756369] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 89046 cycles/operation, 21 cycle
[ 51.757527] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 77992 cycles/operation, 19 cycle
[ 51.758526] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 76021 cycles/operation, 18 cycle
[ 51.759442] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 312260 cycles/operation, 38 cycle
[ 51.763195] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 176472 cycles/operation, 21 cycle
[ 51.765255] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 169565 cycles/operation, 20 cycle
[ 51.767321] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 164968 cycles/operation, 20 cycle
[ 51.769256] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 165096 cycles/operation, 20 cycle
testing speed of async cmac(aes-generic) (cmac(aes-generic))
[ 97.835925] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 665 cycles/operation, 41 cycle
[ 97.835945] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 2430 cycles/operation, 37 cycle
[ 97.836016] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 1656 cycles/operation, 25 cycle
[ 97.836044] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 9014 cycles/operation, 35 cycle
[ 97.836259] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 13444 cycles/operation, 52 cycle
[ 97.836399] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 8960 cycles/operation, 35 cycle
[ 97.836515] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 51594 cycles/operation, 50 cycle
[ 97.837151] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 28105 cycles/operation, 27 cycle
[ 97.837497] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 31365 cycles/operation, 30 cycle
[ 97.837865] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 86111 cycles/operation, 42 cycle
[ 97.838927] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 60021 cycles/operation, 29 cycle
[ 97.839628] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 56311 cycles/operation, 27 cycle
[ 97.840308] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 50877 cycles/operation, 24 cycle
[ 97.840943] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 174028 cycles/operation, 42 cycle
[ 97.843205] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 103243 cycles/operation, 25 cycle
[ 97.844524] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 99960 cycles/operation, 24 cycle
[ 97.845865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 121735 cycles/operation, 29 cycle
[ 97.847355] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 387559 cycles/operation, 47 cycle
[ 97.851930] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 223662 cycles/operation, 27 cycle
[ 97.854617] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 226131 cycles/operation, 27 cycle
[ 97.857385] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 203840 cycles/operation, 24 cycle
[ 97.859888] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 220232 cycles/operation, 26 cycle
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com