On 11/12/21 2:37 PM, Sean Christopherson wrote:
On Fri, Nov 12, 2021, Borislav Petkov wrote:
On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
In the automatic change proposal, both the the host and a guest bug will
cause a guest to get the #VC and then the guest can decide whether it
wants to proceed or terminate. If it chooses to move, it can poison the
page and log it for future examination.
What do you suggest instead?
Let userspace decide what is mapped shared and what is mapped private. The kernel
and KVM provide the APIs/infrastructure to do the actual conversions in a thread-safe
fashion and also to enforce the current state, but userspace is the control plane.
It would require non-trivial changes in userspace if there are multiple processes
accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
solvable. The exit to userspace means all three components (guest, kernel,
and userspace) have full knowledge of what is shared and what is private. There
is zero ambiguity:
- if userspace accesses guest private memory, it gets SIGSEGV or whatever.
- if kernel accesses guest private memory, it does BUG/panic/oops[*]
- if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
This is the direction KVM TDX support is headed, though it's obviously still a WIP.
Just curious, in this approach, how do you propose handling the host
kexec/kdump? If a kexec/kdump occurs while the VM is still active, the
new kernel will encounter the #PF (RMP violation) because some pages are
still marked 'private' in the RMP table.
And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
define that:
a) All convertible memory, i.e. RAM, starts as private.
b) Conversions between private and shared must be done via explicit hypercall.
Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
type as implicit conversions.
[*] Sadly, fully preventing kernel access to guest private is not possible with
TDX, especially if the direct map is left intact. But maybe in the future
TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
thanks