I'm changing the subject title, as the original series has been merged. On Mon, Dec 23, 2019 at 6:46 AM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > On Fri, 20 Dec 2019 at 20:02, Eneas U de Queiroz <cotequeiroz@xxxxxxxxx> wrote: > > > > I've been trying to make the Qualcomm Crypto Engine work with GCM-mode > > AES. I fixed some bugs, and added an option to build only hashes or > > skciphers, as the VPN performance increases if you leave some of that to > > the CPU. > > > > A discussion about this can be found here: > > https://github.com/openwrt/openwrt/pull/2518 > > > > I'm using openwrt to test this, and there's no support for kernel 5.x > > yet. So I have backported the recent skcipher updates, and tested this > > with 4.19. I don't have the hardware with me, but I have run-tested > > everything, working remotely. > > > > All of the skciphers directly implemented by the driver work. They pass > > the tcrypt tests, and also some tests from userspace using AF_ALG: > > https://github.com/cotequeiroz/afalg_tests > > > > However, I can't get gcm(aes) to work. When setting the gcm-mode key, > > it sets the ctr(aes) key, then encrypt a block of zeroes, and uses that > > as the ghash key. The driver fails to perform that encryption. I've > > dumped the input and output data, and they apparently are not touched by > > the QCE. The IV, which written to a buffer appended to the results sg > > list gets updated, but the results themselves are not. I'm not sure > > what goes wrong, if it is a DMA/cache problem, memory alignment, or > > whatever. > > > > This does sound like a DMA problem. I assume the accelerator is not > cache coherent? > > In any case, it is dubious whether the round trip to the accelerator > is worth it when encrypting the GHASH key. Just call aes_encrypt() > instead, and do it in software. ipsec still fails, even if I use software for every single-block operation. I can perhaps leave that as an optimization, but it won't fix the main issue. > > If I take 'be128 hash' out of the 'data' struct, and kzalloc them > > separately in crypto_gcm_setkey (crypto/gcm.c), it encrypts the data > > just fine--perhaps the payload and the request struct can't be in the > > same page? > > > > Non-cache coherent DMA involves cache invalidation on inbound data. So > if both the device and the CPU write to the same cacheline while the > buffer is mapped for DMA from device to memory, one of the updates > gets lost. Can you give me any pointers/examples of how I can make this work? Thanks, Eneas