On Fri, Sep 16, 2022 at 12:57:33PM +0000, Taehee Yoo wrote: > The purpose of this patchset is to support the implementation of ARIA-AVX. > Many of the ideas in this implementation are from Camellia-avx, > especially byte slicing. > Like Camellia, ARIA also uses a 16way strategy. > > ARIA cipher algorithm is similar to AES. > There are four s-boxes in the ARIA spec and the first and second s-boxes > are the same as AES's s-boxes. > Almost functions are based on aria-generic code except for s-box related > function. > The aria-avx doesn't implement the key expanding function. > it supports only encrypt() and decrypt(). > > Encryption and Decryption are actually the same but it should use > separated keys(encryption key and decryption key). > En/Decryption steps are like below: > 1. Add-Round-Key > 2. S-box. > 3. Diffusion Layer. > > There is no special thing in the Add-Round-Key step. > > There are some notable things in s-box step. > Like Camellia, it doesn't use a lookup table, instead, it uses AES-NI. > There are 2 implementations for that. > One is to use AES-NI and affine transformation, which is the same as > Camellia, sm4, and others. > Another is to use GFNI. > GFNI implementation is faster than AES-NI implementation. > So, it uses GFNI implementation if the running CPU supports GFNI. > > To calculate the first s-box(S1), it just uses the aesenclast and then > inverts shift_row. No more process is needed for this job because the > first s-box is the same as the AES encryption s-box. > > To calculate the second s-box(X1, invert of S1), it just uses the > aesdeclast and then inverts shift_row. No more process is needed > for this job because the second s-box is the same as the AES > decryption s-box. > > To calculate the third s-box(S2), it uses the aesenclast, > then affine transformation, which is combined AES inverse affine and > ARIA S2. > > To calculate the last s-box(X2, invert of S2), it uses the aesdeclast, > then affine transformation, which is combined X2 and AES forward affine. > > The optimized third and last s-box logic and GFNI s-box logic are > implemented by Jussi Kivilinna. > > The aria-generic implementation is based on a 32-bit implementation, > not an 8-bit implementation. > The aria-avx Diffusion Layer implementation is based on aria-generic > implementation because 8-bit implementation is not fit for parallel > implementation but 32-bit is fit for this. > > The first patch in this series is to export functions for aria-avx. > The aria-avx uses existing functions in the aria-generic code. > The second patch is to implement aria-avx. > The last patch is to add async test for aria. > > Benchmarks: > The tcrypt is used. > cpu: i3-12100 > > How to test: > modprobe aria-generic > tcrypt mode=610 num_mb=8192 > > Result: > testing speed of multibuffer ecb(aria) (ecb(aria-generic)) encryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 534 cycles > test 2 (128 bit key, 128 byte blocks): 1 operation in 2006 cycles > test 3 (128 bit key, 256 byte blocks): 1 operation in 3674 cycles > test 6 (128 bit key, 4096 byte blocks): 1 operation in 52374 cycles > test 7 (256 bit key, 16 byte blocks): 1 operation in 608 cycles > test 9 (256 bit key, 128 byte blocks): 1 operation in 2586 cycles > test 10 (256 bit key, 256 byte blocks): 1 operation in 4707 cycles > test 13 (256 bit key, 4096 byte blocks): 1 operation in 69794 cycles > > testing speed of multibuffer ecb(aria) (ecb(aria-generic)) decryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 545 cycles > test 2 (128 bit key, 128 byte blocks): 1 operation in 1995 cycles > test 3 (128 bit key, 256 byte blocks): 1 operation in 3673 cycles > test 6 (128 bit key, 4096 byte blocks): 1 operation in 52359 cycles > test 7 (256 bit key, 16 byte blocks): 1 operation in 615 cycles > test 9 (256 bit key, 128 byte blocks): 1 operation in 2588 cycles > test 10 (256 bit key, 256 byte blocks): 1 operation in 4712 cycles > test 13 (256 bit key, 4096 byte blocks): 1 operation in 69916 cycles > > How to test: > modprobe aria > tcrypt mode=610 num_mb=8192 > > AVX with AES-NI: > testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 629 cycles > test 2 (128 bit key, 128 byte blocks): 1 operation in 2060 cycles > test 3 (128 bit key, 256 byte blocks): 1 operation in 1223 cycles > test 6 (128 bit key, 4096 byte blocks): 1 operation in 11931 cycles > test 7 (256 bit key, 16 byte blocks): 1 operation in 686 cycles > test 9 (256 bit key, 128 byte blocks): 1 operation in 2616 cycles > test 10 (256 bit key, 256 byte blocks): 1 operation in 1439 cycles > test 13 (256 bit key, 4096 byte blocks): 1 operation in 15488 cycles > > testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 609 cycles > test 2 (128 bit key, 128 byte blocks): 1 operation in 2027 cycles > test 3 (128 bit key, 256 byte blocks): 1 operation in 1211 cycles > test 6 (128 bit key, 4096 byte blocks): 1 operation in 12040 cycles > test 7 (256 bit key, 16 byte blocks): 1 operation in 684 cycles > test 9 (256 bit key, 128 byte blocks): 1 operation in 2614 cycles > test 10 (256 bit key, 256 byte blocks): 1 operation in 1445 cycles > test 13 (256 bit key, 4096 byte blocks): 1 operation in 15478 cycles > > AVX with GFNI: > testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 730 cycles > test 2 (128 bit key, 128 byte blocks): 1 operation in 2056 cycles > test 3 (128 bit key, 256 byte blocks): 1 operation in 1028 cycles > test 6 (128 bit key, 4096 byte blocks): 1 operation in 9223 cycles > test 7 (256 bit key, 16 byte blocks): 1 operation in 685 cycles > test 9 (256 bit key, 128 byte blocks): 1 operation in 2603 cycles > test 10 (256 bit key, 256 byte blocks): 1 operation in 1179 cycles > test 13 (256 bit key, 4096 byte blocks): 1 operation in 11728 cycles > > testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption > test 0 (128 bit key, 16 byte blocks): 1 operation in 617 cycles > test 2 (128 bit key, 128 byte blocks): 1 operation in 2057 cycles > test 3 (128 bit key, 256 byte blocks): 1 operation in 1020 cycles > test 6 (128 bit key, 4096 byte blocks): 1 operation in 9280 cycles > test 7 (256 bit key, 16 byte blocks): 1 operation in 687 cycles > test 9 (256 bit key, 128 byte blocks): 1 operation in 2599 cycles > test 10 (256 bit key, 256 byte blocks): 1 operation in 1176 cycles > test 13 (256 bit key, 4096 byte blocks): 1 operation in 11909 cycles > > v4: > - Fix sparse warning. > - Remove .align statement for .text > - https://lkml.kernel.org/r/20220915111144.248229966@xxxxxxxxxxxxx > > v3: > - Use ECB macro instead of opencode. > - Implement ctr(aria-avx). > - Improve performance(20% ~ 30%) with combined affine transformation > for S2 and X2. > - Implemented by Jussi Kivilinna. > - Improve performance( ~ 55%) with GFNI. > - Implemented by Jussi Kivilinna. > - Add aria-ctr async speed test. > - Add aria-gcm multi buffer speed test > - Rebase and fix Kconfig > > v2: > - Do not call non-FPU functions(aria_{encrypt | decrypt}()) in the > FPU context. > - Do not acquire FPU context for too long. > > Taehee Yoo (3): > crypto: aria: prepare generic module for optimized implementations > crypto: aria-avx: add AES-NI/AVX/x86_64/GFNI assembler implementation > of aria cipher > crypto: tcrypt: add async speed test for aria cipher > > arch/x86/crypto/Kconfig | 18 + > arch/x86/crypto/Makefile | 3 + > arch/x86/crypto/aria-aesni-avx-asm_64.S | 1303 +++++++++++++++++++++++ > arch/x86/crypto/aria-avx.h | 16 + > arch/x86/crypto/aria_aesni_avx_glue.c | 213 ++++ > crypto/Makefile | 2 +- > crypto/{aria.c => aria_generic.c} | 39 +- > crypto/tcrypt.c | 30 + > include/crypto/aria.h | 17 +- > 9 files changed, 1623 insertions(+), 18 deletions(-) > create mode 100644 arch/x86/crypto/aria-aesni-avx-asm_64.S > create mode 100644 arch/x86/crypto/aria-avx.h > create mode 100644 arch/x86/crypto/aria_aesni_avx_glue.c > rename crypto/{aria.c => aria_generic.c} (86%) > > -- > 2.17.1 All applied. Thanks. -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt