This series is related to Catalin's work [0] on reducing the memory overhead of supporting non-coherent DMA, which requires memory buffers to be rounded up to avoid corruption by the cache invalidation that is needed for inbound DMA. In the crypto subsystem, every skcipher, aead or ahash request requires a request struct to be allocated, and these are usually backed by kmalloc(). Such request buffers are sized dynamically, based on the requirements of the implementation of the algorithm, and the surplus is made available to the driver via an opaque context pointer. Since some drivers may perform inbound non-coherent DMA into that buffer, it must not share any cachelines with adjacent allocations, and this is why the context pointer is rounded up to ARCH_KMALLOC_MINALIGN, which takes the minimum DMA alignment into account on architectures where this is needed. This means that, even when using crypto drivers that don't do DMA to begin with (which includes synchronous skciphers, aeads and ahashes based on CPU instructions), or that do only coherent DMA are forced to perform this padding and alignment, which may affect the memory footprint substantially: on arm64, the compile time minimum DMA alignment is 128 bytes. So instead, require drivers to set a new flag CRYPTO_ALG_NEED_DMA_ALIGNMENT if it performs DMA into the context buffers, and only take DMA alignment into account if the flag is set. Initially, we set this flag for all asynchronous accelerator drivers in drivers/crypto, which simply preserves the status quo for these systems once the subsequent patches get rid of this overhead. Future patches could be applied to drivers that don't actually need it, or only need it when running in non-coherent mode. Note that the new approach proposed here still uses the compile time value for DMA alignment, but this can be updated to use the runtime values (which is usually lower and therefore less wasteful) after Catalin's changes land. [0] https://lore.kernel.org/linux-arm-kernel/20220405135758.774016-1-catalin.marinas@xxxxxxx/ Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Marc Zyngier <maz@xxxxxxxxxx> Cc: Arnd Bergmann <arnd@xxxxxxxx> Cc: Eric Biggers <ebiggers@xxxxxxxxxx>, Cc: Gilad Ben-Yossef <gilad@xxxxxxxxxxxxx> Cc: Corentin Labbe <clabbe@xxxxxxxxxxxx> Cc: Saravana Kannan <saravanak@xxxxxxxxxx> Ard Biesheuvel (8): crypto: add flag for algos that need DMA aligned context buffers crypto: safexcel - take request size after setting TFM crypto: drivers - set CRYPTO_ALG_NEED_DMA_ALIGNMENT where needed crypto: drivers - avoid setting skcipher TFM reqsize directly crypto: skcipher - avoid rounding up request size to DMA alignment crypto: aead - avoid DMA alignment for request structures unless needed crypto: ahash - avoid DMA alignment for request structures unless needed crypto: safexcel - reduce alignment of stack buffer drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c | 6 +- drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c | 20 +++---- drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c | 6 +- drivers/crypto/amcc/crypto4xx_core.c | 8 +++ drivers/crypto/amlogic/amlogic-gxl-cipher.c | 5 +- drivers/crypto/amlogic/amlogic-gxl-core.c | 2 + drivers/crypto/atmel-aes.c | 2 +- drivers/crypto/atmel-sha.c | 2 +- drivers/crypto/atmel-tdes.c | 2 +- drivers/crypto/axis/artpec6_crypto.c | 8 +++ drivers/crypto/bcm/cipher.c | 23 +++++++- drivers/crypto/caam/caamalg.c | 4 +- drivers/crypto/caam/caamalg_qi.c | 2 + drivers/crypto/caam/caamalg_qi2.c | 4 +- drivers/crypto/caam/caamhash.c | 3 +- drivers/crypto/cavium/cpt/cptvf_algs.c | 6 ++ drivers/crypto/cavium/nitrox/nitrox_aead.c | 6 +- drivers/crypto/cavium/nitrox/nitrox_skcipher.c | 24 +++++--- drivers/crypto/ccree/cc_aead.c | 3 + drivers/crypto/ccree/cc_cipher.c | 3 + drivers/crypto/ccree/cc_hash.c | 6 ++ drivers/crypto/chelsio/chcr_algo.c | 5 +- drivers/crypto/gemini/sl3516-ce-cipher.c | 5 +- drivers/crypto/gemini/sl3516-ce-core.c | 1 + drivers/crypto/hifn_795x.c | 4 +- drivers/crypto/hisilicon/sec/sec_algs.c | 8 +++ drivers/crypto/hisilicon/sec2/sec_crypto.c | 2 + drivers/crypto/inside-secure/safexcel.h | 17 +++--- drivers/crypto/inside-secure/safexcel_cipher.c | 55 ++++++++++++++++-- drivers/crypto/inside-secure/safexcel_hash.c | 26 +++++++++ drivers/crypto/ixp4xx_crypto.c | 2 + drivers/crypto/keembay/keembay-ocs-aes-core.c | 12 ++++ drivers/crypto/keembay/keembay-ocs-hcu-core.c | 30 ++++++---- drivers/crypto/marvell/cesa/cipher.c | 6 ++ drivers/crypto/marvell/cesa/hash.c | 6 ++ drivers/crypto/marvell/octeontx/otx_cptvf_algs.c | 60 +++++++++++++++----- drivers/crypto/marvell/octeontx2/otx2_cptvf_algs.c | 52 ++++++++++++----- drivers/crypto/mxs-dcp.c | 8 ++- drivers/crypto/n2_core.c | 1 + drivers/crypto/omap-aes.c | 5 ++ drivers/crypto/omap-des.c | 4 ++ drivers/crypto/omap-sham.c | 12 ++++ drivers/crypto/qce/aead.c | 1 + drivers/crypto/qce/sha.c | 3 +- drivers/crypto/qce/skcipher.c | 1 + drivers/crypto/rockchip/rk3288_crypto_ahash.c | 3 + drivers/crypto/rockchip/rk3288_crypto_skcipher.c | 18 ++++-- drivers/crypto/s5p-sss.c | 6 ++ drivers/crypto/sa2ul.c | 9 +++ drivers/crypto/sahara.c | 14 +++-- drivers/crypto/stm32/stm32-cryp.c | 27 ++++++--- drivers/crypto/stm32/stm32-hash.c | 8 +++ drivers/crypto/talitos.c | 40 +++++++++++++ drivers/crypto/ux500/cryp/cryp_core.c | 21 ++++--- drivers/crypto/ux500/hash/hash_core.c | 12 ++-- drivers/crypto/xilinx/zynqmp-aes-gcm.c | 1 + include/crypto/aead.h | 2 +- include/crypto/hash.h | 5 +- include/crypto/internal/aead.h | 13 ++++- include/crypto/internal/hash.h | 10 +++- include/crypto/internal/skcipher.h | 13 ++++- include/crypto/skcipher.h | 8 +-- include/linux/crypto.h | 21 +++++++ 63 files changed, 568 insertions(+), 134 deletions(-) -- 2.30.2