On Tue, Apr 19, 2022 at 11:50 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > On Mon, 18 Apr 2022 at 18:44, Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > On Mon, Apr 18, 2022 at 04:37:17PM +0800, Herbert Xu wrote: > > BTW before you have a go at this, there's also Linus' idea that does not > > change the crypto code (at least not functionally). Of course, you and > > Ard can still try to figure out how to reduce the padding but if we go > > with Linus' idea of a new GFP_NODMA flag, there won't be any changes to > > the crypto code as long as it doesn't pass such flag. So, the options: > > > > 1. Change ARCH_KMALLOC_MINALIGN to 8 (or ARCH_SLAB_MINALIGN if higher) > > while keeping ARCH_DMA_MINALIGN to 128. By default kmalloc() will > > honour the 128-byte alignment, unless GDP_NODMA is passed. This still > > requires changing CRYPTO_MINALIGN to ARCH_DMA_MINALIGN but there is > > no functional change, kmalloc() without the new flag will return > > CRYPTO_MINALIGN-aligned pointers. > > > > 2. Leave ARCH_KMALLOC_MINALIGN as ARCH_DMA_MINALIGN (128) and introduce > > a new GFP_PACKED (I think it fits better than 'NODMA') flag that > > reduces the minimum kmalloc() below ARCH_KMALLOC_MINALIGN (and > > probably at least ARCH_SLAB_MINALIGN). It's equivalent to (1) but > > does not touch the crypto code at all. > > > > (1) and (2) are the same, just minor naming difference. Happy to go with > > any of them. They still have the downside that we need to add the new > > GFP_ flag to those hotspots that allocate small objects (Arnd provided > > an idea on how to find them with ftrace) but at least we know it won't > > inadvertently break anything. Right, both of these seem reasonable to me. > I'm not sure GFP_NODMA adds much here. > > The way I see it, the issue in the crypto code is that we are relying > on a ARCH_KMALLOC_ALIGN aligned zero length __ctx[] array for three > different things: ... Right. So as long as the crypto subsystem has additional alignment requirement, it won't benefit from GFP_NODMA. For everything else, GFP_NODMA would however have a very direct and measuable impact on memory consumption. Your proposed changes to the crypto subsystem all seem helpful as well, just mostly orthogonal to the savings elsewhere. I don't know how much memory is permanently tied up in overaligned crypto data structures, but my guess is that it's not a lot on most systems. Improving the alignment for crypto would however likely help with stack usage on on-stack structures, and with performance when the amount of ctx memory to clear for each operation becomes smaller. Arnd