Hi Will: I will post the update version 2 of this patch today or tomorrow. Sorry for the delay. > -----Original Message----- > From: Will Deacon <will@xxxxxxxxxx> > Sent: Tuesday, December 14, 2021 2:29 AM > To: Ard Biesheuvel <ardb@xxxxxxxxxx> > Cc: Eric Biggers <ebiggers@xxxxxxxxxx>; Xiaokang Qian > <Xiaokang.Qian@xxxxxxx>; Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>; > David S. Miller <davem@xxxxxxxxxxxxx>; Catalin Marinas > <Catalin.Marinas@xxxxxxx>; nd <nd@xxxxxxx>; Linux Crypto Mailing List > <linux-crypto@xxxxxxxxxxxxxxx>; Linux ARM <linux-arm- > kernel@xxxxxxxxxxxxxxxxxxx>; Linux Kernel Mailing List <linux- > kernel@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way > interleave of aes and ghash > > On Tue, Sep 28, 2021 at 11:04:03PM +0200, Ard Biesheuvel wrote: > > On Tue, 28 Sept 2021 at 08:27, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > > > On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote: > > > > To improve performance on cores with deep piplines such as A72,N1, > > > > implement gcm(aes) using a 4-way interleave of aes and ghash > > > > (totally > > > > 8 blocks in parallel), which can make full utilize of pipelines > > > > rather than the 4-way interleave we used currently. It can gain > > > > about 20% for big data sizes such that 8k. > > > > > > > > This is a complete new version of the GCM part of the combined > > > > GCM/GHASH driver, it will co-exist with the old driver, only serve > > > > for big data sizes. Instead of interleaving four invocations of > > > > AES where each chunk of 64 bytes is encrypted first and then > > > > ghashed, the new version uses a more coarse grained approach where > > > > a chunk of 64 bytes is encrypted and at the same time, one chunk > > > > of 64 bytes is ghashed (or ghashed and decrypted in the converse case). > > > > > > > > The table below compares the performance of the old driver and the > > > > new one on various micro-architectures and running in various > > > > modes with various data sizes. > > > > > > > > | AES-128 | AES-192 | AES-256 | > > > > #bytes | 1024 | 1420 | 8k | 1024 | 1420 | 8k | 1024 | 1420 | 8k | > > > > -------+------+------+-----+------+------+-----+------+------+-----+ > > > > A72 | 5.5% | 12% | 25% | 2.2% | 9.5%| 23%| -1% | 6.7%| 19% | > > > > A57 |-0.5% | 9.3%| 32% | -3% | 6.3%| 26%| -6% | 3.3%| 21% | > > > > N1 | 0.4% | 7.6%|24.5%| -2% | 5% | 22%| -4% | > > > > 2.7%| 20% | > > > > > > > > Signed-off-by: XiaokangQian <xiaokang.qian@xxxxxxx> > > > > > > Does this pass the self-tests, including the fuzz tests which are > > > enabled by CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y? > > > > > > > Please test both little-endian and big-endian. (Note that you don't > > need a big-endian user space for this - the self tests are executed > > before the rootfs is mounted) > > > > Also, you will have to rebase this onto the latest cryptodev tree, > > which carries some changes I made recently to this driver. > > XiaokangQian -- did you post an updated version of this? It would end up > going via Herbert, but I was keeping half an eye on it and it all seems to have > gone quiet. > > Thanks, > > Will