Re: [PATCH 0/5] crypto: Implement cmac based on cbc skcipher

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/20/20 12:56 AM, Ard Biesheuvel wrote:
On Thu, 20 Aug 2020 at 09:54, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:

On Thu, Aug 20, 2020 at 09:48:02AM +0200, Ard Biesheuvel wrote:

Or are you saying on Ben's machine cbc-aesni would have worse
performance vs. aes-generic?


Yes, given the pathological overhead of FPU preserve/restore for every
block of 16 bytes processed by the cbcmac wrapper.

I'm sceptical.  Do we have numbers showing this? You can get them
from tcrypt with my patch:

         https://patchwork.kernel.org/patch/11701343/

Just do

         modprobe tcrypt mode=400 alg='cmac(aes-aesni)' klen=16
         modprobe tcrypt mode=400 alg='cmac(aes-generic)' klen=16

cmac() is not really relevant for performance, afaict. Only cbcmac()
is used for bulk data.

Sure but it's trivial to extend my cmac patch to support cbcmac.



Sure.

Ben, care to have a go at the above on your hardware? It would help us
get to the bottom of this issue.

Here's a run on an:  Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz

               testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[  259.397756] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    244 cycles/operation,   15 cycles/byte
[  259.397759] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   1052 cycles/operation,   16 cycles/byte
[  259.397765] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):    641 cycles/operation,   10 cycles/byte
[  259.397768] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   3909 cycles/operation,   15 cycles/byte
[  259.397786] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   2602 cycles/operation,   10 cycles/byte
[  259.397797] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   2211 cycles/operation,    8 cycles/byte
[  259.397807] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  15453 cycles/operation,   15 cycles/byte
[  259.397872] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):   8863 cycles/operation,    8 cycles/byte
[  259.397910] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):   8442 cycles/operation,    8 cycles/byte
[  259.397946] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  43542 cycles/operation,   21 cycles/byte
[  259.398110] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  17649 cycles/operation,    8 cycles/byte
[  259.398184] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  21255 cycles/operation,   10 cycles/byte
[  259.398267] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  16322 cycles/operation,    7 cycles/byte
[  259.398335] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):  60301 cycles/operation,   14 cycles/byte
[  259.398585] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):  34413 cycles/operation,    8 cycles/byte
[  259.398728] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  32894 cycles/operation,    8 cycles/byte
[  259.398865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):  32521 cycles/operation,    7 cycles/byte
[  259.399000] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 120415 cycles/operation,   14 cycles/byte
[  259.399550] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  68635 cycles/operation,    8 cycles/byte
[  259.399834] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):  83770 cycles/operation,   10 cycles/byte
[  259.400157] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):  65075 cycles/operation,    7 cycles/byte
[  259.400427] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates):  65085 cycles/operation,    7 cycles/byte
[  294.171336]
               testing speed of async cmac(aes-generic) (cmac(aes-generic))
[  294.171340] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    275 cycles/operation,   17 cycles/byte
[  294.171343] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   1191 cycles/operation,   18 cycles/byte
[  294.171350] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):    738 cycles/operation,   11 cycles/byte
[  294.171354] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   4386 cycles/operation,   17 cycles/byte
[  294.171374] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   2915 cycles/operation,   11 cycles/byte
[  294.171387] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   2464 cycles/operation,    9 cycles/byte
[  294.171398] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  17558 cycles/operation,   17 cycles/byte
[  294.171472] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):  14022 cycles/operation,   13 cycles/byte
[  294.171530] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):   9022 cycles/operation,    8 cycles/byte
[  294.171569] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  38107 cycles/operation,   18 cycles/byte
[  294.171722] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  18083 cycles/operation,    8 cycles/byte
[  294.171798] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  17260 cycles/operation,    8 cycles/byte
[  294.171870] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  17415 cycles/operation,    8 cycles/byte
[  294.171943] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):  66005 cycles/operation,   16 cycles/byte
[  294.172217] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):  36035 cycles/operation,    8 cycles/byte
[  294.172366] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  42812 cycles/operation,   10 cycles/byte
[  294.172533] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):  53415 cycles/operation,   13 cycles/byte
[  294.172745] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 133326 cycles/operation,   16 cycles/byte
[  294.173297] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  90271 cycles/operation,   11 cycles/byte
[  294.173646] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):  68703 cycles/operation,    8 cycles/byte
[  294.173931] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):  67951 cycles/operation,    8 cycles/byte
[  294.174213] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates):  68370 cycles/operation,    8 cycles/byte


On my slow apu2 board with processor: AMD GX-412TC SOC

              testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[   51.750514] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    600 cycles/operation,   37 cycle
[   51.750532] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   2063 cycles/operation,   32 cycle
[   51.750582] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   1326 cycles/operation,   20 cycle
[   51.750619] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):  11190 cycles/operation,   43 cycle
[   51.750775] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   4935 cycles/operation,   19 cycle
[   51.750840] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   8652 cycles/operation,   33 cycle
[   51.750948] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  43430 cycles/operation,   42 cycle
[   51.751488] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):  23589 cycles/operation,   23 cycle
[   51.751810] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  18759 cycles/operation,   18 cycle
[   51.752027] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  79699 cycles/operation,   38 cycle
[   51.753035] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  39900 cycles/operation,   19 cycle
[   51.753559] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  38390 cycles/operation,   18 cycle
[   51.754057] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  40888 cycles/operation,   19 cycle
[   51.754615] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates): 143019 cycles/operation,   34 cycle
[   51.756369] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):  89046 cycles/operation,   21 cycle
[   51.757527] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  77992 cycles/operation,   19 cycle
[   51.758526] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):  76021 cycles/operation,   18 cycle
[   51.759442] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 312260 cycles/operation,   38 cycle
[   51.763195] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates): 176472 cycles/operation,   21 cycle
[   51.765255] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates): 169565 cycles/operation,   20 cycle
[   51.767321] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates): 164968 cycles/operation,   20 cycle
[   51.769256] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 165096 cycles/operation,   20 cycle

              testing speed of async cmac(aes-generic) (cmac(aes-generic))
[   97.835925] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    665 cycles/operation,   41 cycle
[   97.835945] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   2430 cycles/operation,   37 cycle
[   97.836016] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   1656 cycles/operation,   25 cycle
[   97.836044] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   9014 cycles/operation,   35 cycle
[   97.836259] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):  13444 cycles/operation,   52 cycle
[   97.836399] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   8960 cycles/operation,   35 cycle
[   97.836515] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  51594 cycles/operation,   50 cycle
[   97.837151] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):  28105 cycles/operation,   27 cycle
[   97.837497] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  31365 cycles/operation,   30 cycle
[   97.837865] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  86111 cycles/operation,   42 cycle
[   97.838927] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  60021 cycles/operation,   29 cycle
[   97.839628] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  56311 cycles/operation,   27 cycle
[   97.840308] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  50877 cycles/operation,   24 cycle
[   97.840943] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates): 174028 cycles/operation,   42 cycle
[   97.843205] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates): 103243 cycles/operation,   25 cycle
[   97.844524] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  99960 cycles/operation,   24 cycle
[   97.845865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 121735 cycles/operation,   29 cycle
[   97.847355] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 387559 cycles/operation,   47 cycle
[   97.851930] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates): 223662 cycles/operation,   27 cycle
[   97.854617] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates): 226131 cycles/operation,   27 cycle
[   97.857385] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates): 203840 cycles/operation,   24 cycle
[   97.859888] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 220232 cycles/operation,   26 cycle

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux