On Thu, Jun 06, 2024 at 11:17:36AM +0100, Marc Zyngier wrote: > On Wed, 05 Jun 2024 16:08:49 +0100, > Steven Price <steven.price@xxxxxxx> wrote: > > 2. Use a special (global) memory allocator that does the > > set_memory_decrypted() dance on the pages that it allocates but allows > > packing the allocations. I'm not aware of an existing kernel API for > > this, so it's potentially quite a bit of code. The benefit is that it > > reduces memory consumption in a realm guest, although fragmentation > > still means we're likely to see a (small) growth. > > > > Any thoughts on what you think would be best? > > I would expect that something similar to kmem_cache could be of help, > only with the ability to deal with variable object sizes (in this > case: minimum of 256 bytes, in increments defined by the > implementation, and with a 256 byte alignment). Hmm, that's doable but not that easy to make generic. We'd need a new class of kmalloc-* caches (e.g. kmalloc-decrypted-*) which use only decrypted pages together with a GFP_DECRYPTED flag or something to get the slab allocator to go for these pages and avoid merging with other caches. It would actually be the page allocator parsing this gfp flag, probably in post_alloc_hook() to set the page decrypted and re-encrypt it in free_pages_prepare(). A slight problem here is that free_pages() doesn't get the gfp flag, so we'd need to store some bit in the page flags. Maybe the flag is not that bad, do we have something like for page_to_phys() to give us the high IPA address for decrypted pages? Similarly if we go for a kmem_cache (or a few for multiple sizes). One can specify a constructor which could set the memory decrypted but there's no destructor (and also the constructor is per object, not per page, so we'd need some refcounting). Another approach contained within the driver is to use mempool_create() with our own _alloc_fn/_free_fn that sets the memory decrypted/encrypted accordingly, though sub-page allocations need additional tracking. Also that's fairly similar to kmem_cache, fixed size. Yet another option would be to wire it somehow in the DMA API but the minimum allocation is already a page size, so we don't gain anything. What gets somewhat closer to what we need is gen_pool. It can track different sizes, we just need to populate the chunks as needed. I don't think this would work as a generic allocator but may be good enough within the ITS code. If there's a need for such generic allocations in other parts of the kernel, my preference would be something around kmalloc caches and a new GFP flag (first option; subject to the selling it to the mm folk). But that's more of a separate prototyping effort that may or may not succeed. Anyway, we could do some hacking around gen_pool as a temporary solution (maybe as a set of patches on top of this series to be easier to revert) and start investigating a proper decrypted page allocator in parallel. We just need to find a victim that has the page allocator fresh in mind (Ryan or Alexandru ;)). -- Catalin