Re: [RFC PATCH 0/3] KVM: Introduce "VM bugged" concept

Cornelia Huck <cohuck@xxxxxxxxxx> · Tue, 29 Sep 2020 11:27:10 +0200

On Wed, 23 Sep 2020 15:45:27 -0700
Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:

> This series introduces a concept we've discussed a few times in x86 land.
> The crux of the problem is that x86 has a few cases where KVM could
> theoretically encounter a software or hardware bug deep in a call stack
> without any sane way to propagate the error out to userspace.
> 
> Another use case would be for scenarios where letting the VM live will
> do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> enabling as botching anything related to secure paging all but guarantees
> there will be a flood of WARNs and error messages because lower level PTE
> operations will fail if an upper level operation failed.
> 
> The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> to userspace, and mark the VM as bugged so that no ioctls() can be issued
> on the VM or its devices/vCPUs.

I think this makes a lot of sense.

Are there other user space interactions where we want to generate an
error for a bugged VM, e.g. via eventfd?

And can we make the 'bugged' information available to user space in a
structured way?

> 
> RFC as I've done nowhere near enough testing to verify that rejecting the
> ioctls(), evicting running vCPUs, etc... works as intended.
> 
> Sean Christopherson (3):
>   KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
>     bugged
>   KVM: Add infrastructure and macro to mark VM as bugged
>   KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
>     VM
> 
>  arch/x86/kvm/svm/svm.c   |  2 +-
>  arch/x86/kvm/vmx/vmx.c   | 23 ++++++++++++--------
>  arch/x86/kvm/x86.c       |  4 ++++
>  include/linux/kvm_host.h | 45 ++++++++++++++++++++++++++++++++--------
>  virt/kvm/kvm_main.c      | 11 +++++-----
>  5 files changed, 61 insertions(+), 24 deletions(-)
>