From: Catalin Marinas <catalin.marinas@xxxxxxx> Sent: Monday, June 10, 2024 10:46 AM > > On Mon, Jun 10, 2024 at 05:03:44PM +0000, Michael Kelley wrote: > > From: Catalin Marinas <catalin.marinas@xxxxxxx> Sent: Monday, June 10, 2024 3:34 AM > > > I wonder whether something like __GFP_DECRYPTED could be used to get > > > shared memory from the allocation time and avoid having to change the > > > vmalloc() ranges. This way functions like netvsc_init_buf() would get > > > decrypted memory from the start and vmbus_establish_gpadl() would not > > > need to call set_memory_decrypted() on a vmalloc() address. > > > > I would not have any conceptual objections to such an approach. But I'm > > certainly not an expert in that area so I'm not sure what it would take > > to make that work for vmalloc(). I presume that __GFP_DECRYPTED > > should also work for kmalloc()? > > > > I've seen the separate discussion about a designated pool of decrypted > > memory, to avoid always allocating a new page and decrypting when a > > smaller allocation is sufficient. If such a pool could also work for page size > > or larger allocations, it would have the additional benefit of concentrating > > decrypted allocations in fewer 2 Meg large pages vs. scattering wherever > > and forcing the break-up of more large page mappings in the direct map. > > Yeah, my quick, not fully tested hack here: > > https://lore.kernel.org/linux-arm-kernel/ZmNJdSxSz-sYpVgI@xxxxxxx/ > > It's the underlying page allocator that gives back decrypted pages when > the flag is passed, so it should work for alloc_pages() and friends. The > kmalloc() changes only ensure that we have separate caches for this > memory and they are not merged. It needs some more work on kmem_cache, > maybe introducing a SLAB_DECRYPTED flag as well as not to rely on the > GFP flag. > > For vmalloc(), we'd need a pgprot_decrypted() macro to ensure the > decrypted pages are marked with the appropriate attributes (arch > specific), otherwise it's fairly easy to wire up if alloc_pages() gives > back decrypted memory. > > > I'll note that netvsc devices can be added or removed from a running VM. > > The vmalloc() memory allocated by netvsc_init_buf() can be freed, and/or > > additional calls to netvsc_init_buf() can be made at any time -- they aren't > > limited to initial Linux boot. So the mechanism for getting decrypted > > memory at allocation time must be reasonably dynamic. > > I think the above should work. But, of course, we'd have to get this > past the mm maintainers, it's likely that I missed something. Having thought about this a few days, I like the model of telling the memory allocators to decrypt/re-encrypt the memory, instead of the caller having to explicitly do set_memory_decrypted()/encrypted(). I'll add some further comments to the thread with your initial implementation. > > > Rejecting vmalloc() addresses may work for the moment -- I don't know > > when CCA guests might be tried on Hyper-V. The original SEV-SNP and TDX > > work started that way as well. :-) Handling the vmalloc() case was added > > later, though I think on x86 the machinery to also flip all the alias PTEs was > > already mostly or completely in place, probably for other reasons. So > > fixing the vmalloc() case was more about not assuming that the underlying > > physical address range is contiguous. Instead, each page must be processed > > independently, which was straightforward. > > There may be a slight performance impact but I guess that's not on a > critical path. Walking the page tables and changing the vmalloc ptes > should be fine but for each page, we'd have to break the linear map, > flush the TLBs, re-create the linear map. Those TLBs may become a > bottleneck, especially on hardware with lots of CPUs and the > microarchitecture. Note that even with a __GFP_DECRYPTED attribute, we'd > still need to go for individual pages in the linear map. Agreed. While synthetic devices can come-and-go anytime, it's pretty rare in the grand scheme of things. I guess we would have to try it on a system with high CPU count, but even if the code needed to "pace itself" to avoid hammering the TLBs too hard, that would be OK. Michael