On Mon, Mar 11, 2024 at 09:15:53AM -0700, mhkelley58@xxxxxxxxx wrote: > From: Michael Kelley <mhklinux@xxxxxxxxxxx> > > Shared (decrypted) pages should never be returned to the page allocator, > lest future usage of the pages store data that should not be exposed to > the host. They may also cause the guest to crash if the page is used in > a way disallowed by HW (i.e. for executable code or as a page table). > > Normally set_memory() call failures are rare. But in CoCo VMs > set_memory_XXcrypted() may involve calls to the untrusted host, and an > attacker could fail these calls such that: > 1. set_memory_encrypted() returns an error and leaves the pages fully > shared. > 2. set_memory_decrypted() returns an error, but the pages are actually > full converted to shared. > > This means that patterns like the below can cause problems: > void *addr = alloc(); > int fail = set_memory_decrypted(addr, 1); > if (fail) > free_pages(addr, 0); > > And: > void *addr = alloc(); > int fail = set_memory_decrypted(addr, 1); > if (fail) { > set_memory_encrypted(addr, 1); > free_pages(addr, 0); > } > > Unfortunately these patterns appear in the kernel. And what the > set_memory() callers should do in this situation is not clear either. They > shouldn’t use them as shared because something clearly went wrong, but > they also need to fully reset the pages to private to free them. But, the > kernel needs the host's help to do this and the host is already being > uncooperative around the needed operations. So this isn't guaranteed to > succeed and the caller is kind of stuck with unusable pages. > > The only choice is to panic or leak the pages. The kernel tries not to > panic if at all possible, so just leak the pages at the call sites. > Separately there is a patch[1] to warn if the guest detects strange host > behavior around this. It is stalled, so in the mean time I’m proceeding > with fixing the callers to leak the pages. No additional warnings are > added, because the plan is to warn in a single place in x86 set_memory() > code. > > This series fixes the cases in the Hyper-V code. > > This is the non-RFC/RFT version of Rick Edgecombe's previous series.[2] > Rick asked me to do this version based on my comments and the testing > I did. I've tested most of the error paths by hacking > set_memory_encrypted() to fail, and observing /proc/vmallocinfo and > /proc/buddyinfo to confirm that the memory is leaked as expected > instead of freed. > > Changes in this version: > * Expanded commit message references to "TDX" to be "CoCo VMs" since > set_memory_encrypted() could fail in other configurations, such as > Hyper-V CoCo guests running with a paravisor on SEV-SNP processors. > * Changed "Subject:" prefixes to match historical practice in Hyper-V > related source files > * Patch 1: Added handling of set_memory_decrypted() failure > * Patch 2: Changed where the "decrypted" flag is set so that > error cases not related to set_memory_encrypted() are handled > correctly > * Patch 2: Fixed the polarity of the test for set_memory_encrypted() > failing > * Added Patch 5 to the series to properly handle free'ing of > ring buffer memory > * Fixed a few typos throughout > > [1] https://lore.kernel.org/lkml/20240122184003.129104-1-rick.p.edgecombe@xxxxxxxxx/ > [2] https://lore.kernel.org/linux-hyperv/20240222021006.2279329-1-rick.p.edgecombe@xxxxxxxxx/ > > Michael Kelley (1): > Drivers: hv: vmbus: Don't free ring buffers that couldn't be > re-encrypted > > Rick Edgecombe (4): > Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails > Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl > hv_netvsc: Don't free decrypted memory > uio_hv_generic: Don't free decrypted memory Applied to hyperv-fixes. Thanks.