On 11/22/21 11:03 AM, Vlastimil Babka wrote:
On 11/22/21 16:23, Brijesh Singh wrote:
Hi Peter,
On 11/12/21 9:43 AM, Peter Gonda wrote:
Hi Brijesh,,
One high level discussion I'd like to have on these SNP KVM patches.
In these patches (V5) if a host userspace process writes a guest
private page a SIGBUS is issued to that process. If the kernel writes
a guest private page then the kernel panics due to the unhandled RMP
fault page fault. This is an issue because not all writes into guest
memory may come from a bug in the host. For instance a malicious or
even buggy guest could easily point the host to writing a private page
during the emulation of many virtual devices (virtio, NVMe, etc). For
example if a well behaved guests behavior is to: start up a driver,
select some pages to share with the guest, ask the host to convert
them to shared, then use those pages for virtual device DMA, if a
buggy guest forget the step to request the pages be converted to
shared its easy to see how the host could rightfully write to private
memory. I think we can better guarantee host reliability when running
SNP guests without changing SNP’s security properties.
Here is an alternative to the current approach: On RMP violation (host
or userspace) the page fault handler converts the page from private to
shared to allow the write to continue. This pulls from s390’s error
handling which does exactly this. See ‘arch_make_page_accessible()’.
Additionally it adds less complexity to the SNP kernel patches, and
requires no new ABI.
In the current (V5) KVM implementation if a userspace process
generates an RMP violation (writes to guest private memory) the
process receives a SIGBUS. At first glance, it would appear that
user-space shouldn’t write to private memory. However, guaranteeing
this in a generic fashion requires locking the RMP entries (via locks
external to the RMP). Otherwise, a user-space process emulating a
guest device IO may be vulnerable to having the guest memory
(maliciously or by guest bug) converted to private while user-space
emulation is happening. This results in a well behaved userspace
process receiving a SIGBUS.
This proposal allows buggy and malicious guests to run under SNP
without jeopardizing the reliability / safety of host processes. This
is very important to a cloud service provider (CSP) since it’s common
to have host wide daemons that write/read all guests, i.e. a single
process could manage the networking for all VMs on the host. Crashing
that singleton process kills networking for all VMs on the system.
Thank you for starting the thread; based on the discussion, I am keeping the
current implementation as-is and *not* going with the auto conversion from
private to shared. To summarize what we are doing in the current SNP series:
- If userspace accesses guest private memory, it gets SIGBUS.
So, is there anything protecting host userspace processes from malicious guests?
Unfortunately, no.
In the future, we could look into Sean's suggestion to come with an ABI
that userspace can use to lock the guest pages before the access and
notify the caller of the access violation. It seems that TDX may need
something similar, but I cannot tell for sure. This proposal seems good
at the first glance but devil is in the detail; once implemented we also
need to measure the performance implication of it.
Should we consider using SIGSEGV (SEGV_ACCERR) instead of SIGBUS? In
other words, treating a guest's private pages as read-only and writing
to them will generate a standard SIGSEGV.
thanks
- If kernel accesses[*] guest private memory, it does panic.
[*] Kernel consults the RMP table for the page ownership before the access.
If the page is shared, then it uses the locking mechanism to ensure that a
guest will not be able to change the page ownership while kernel has it mapped.
thanks