On Thu, 7 Apr 2022 at 06:32, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Apr 06, 2022 at 04:27:09PM +0200, Ard Biesheuvel wrote: > > > > +#define EIP197_SKCIPHER_REQ_SIZE (ALIGN(sizeof(struct skcipher_request), \ > > + CRYPTO_MINALIGN) + \ > > The whole point of CRYPTO_MINALIGN is that it comes for free > via kmalloc. > Yes, but kmalloc allocates the entire block at once, and so all the concatenated structures need to use the same alignment, which results in substantial padding overhead. > If you need alignment over and above that of kmalloc, you need > to do it explicitly in the driver. > But none of the drivers currently do so, and rely on the API to produce ctx struct pointers that are suitably aligned for DMA. Note that the main issue is not the alignment, but the rounding up of the size due to that. For instance, looking at crypto/adiantum.c: struct adiantum_request_ctx { ... other fields ... /* Sub-requests, must be last */ union { struct shash_desc hash_desc; struct skcipher_request streamcipher_req; } u; }; So on arm64, every skcipher_request that gets allocated will be: 128 bytes for the outer skcipher_request + padding 128 bytes for the adiantum request context + padding 128 bytes for the inner skcipher_request + padding n bytes for the inner context Given that the skcipher_request only needs 72 bytes on 64-bit architectures, and Adiantum's reqctx size of ~50 bytes, this results in an overhead of ~200 bytes for every allocation, which is rather wasteful. I think permitting DMA directly into these buffers was a mistake, but it is the situation we are in today. I am only trying to reduce the memory overhead for cases where it is not needed.