Re: [RFC PATCH 0/3] KVM: Introduce "VM bugged" concept

Christian Borntraeger <borntraeger@xxxxxxxxxx> · Thu, 24 Sep 2020 08:37:10 +0200



On 24.09.20 00:45, Sean Christopherson wrote:
> This series introduces a concept we've discussed a few times in x86 land.
> The crux of the problem is that x86 has a few cases where KVM could
> theoretically encounter a software or hardware bug deep in a call stack
> without any sane way to propagate the error out to userspace.
> 
> Another use case would be for scenarios where letting the VM live will
> do more harm than good, e.g. we've been using KVM_BUG_ON for early TDX
> enabling as botching anything related to secure paging all but guarantees
> there will be a flood of WARNs and error messages because lower level PTE
> operations will fail if an upper level operation failed.
> 
> The basic idea is to WARN_ONCE if a bug is encountered, kick all vCPUs out
> to userspace, and mark the VM as bugged so that no ioctls() can be issued
> on the VM or its devices/vCPUs.
> 
> RFC as I've done nowhere near enough testing to verify that rejecting the
> ioctls(), evicting running vCPUs, etc... works as intended.

I like the idea. Especially when we add a common "understanding" in QEMU
across all platforms. That would then even allow to propagate an error.
> 
> Sean Christopherson (3):
>   KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
>     bugged
>   KVM: Add infrastructure and macro to mark VM as bugged
>   KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
>     VM
> 
>  arch/x86/kvm/svm/svm.c   |  2 +-
>  arch/x86/kvm/vmx/vmx.c   | 23 ++++++++++++--------
>  arch/x86/kvm/x86.c       |  4 ++++
>  include/linux/kvm_host.h | 45 ++++++++++++++++++++++++++++++++--------
>  virt/kvm/kvm_main.c      | 11 +++++-----
>  5 files changed, 61 insertions(+), 24 deletions(-)
>