Live-migrations under qemu result in guest corruption when the following three conditions are all met: * The source host CPU has capabilities that itself extend that of the guest CPU fpstate->user_xfeatures * The source kernel emits guest_fpu->user_xfeatures with respect to the host CPU (i.e. it *does not* have the "Fixes:" commit) * The destination kernel enforces that the xfeatures in the buffer given to KVM_SET_IOCTL are compatible with the guest CPU (i.e., it *does* have the "Fixes:" commit) When these conditions are met, the semantical changes to fpstate->user_features trigger a subtle bug in qemu that results in qemu failing to put the XSAVE architectural state into KVM. qemu then both ceases to put the remaining (non-XSAVE) x86 architectural state into KVM and makes the fateful mistake of resuming the guest anyways. This usually results in immediate guest corruption, silent or not. Due to the grave nature of this qemu bug, attempt to retain behavior of old kernels by clamping the xfeatures specified in the buffer given to KVM_SET_IOCTL such that it aligns with the guests fpstate->user_xfeatures instead of returning an error. Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") Cc: stable@xxxxxxxxxxxxxxx Cc: Leonardo Bras <leobras@xxxxxxxxxx> Signed-off-by: Tyler Stachecki <tstachecki@xxxxxxxxxxxxx> --- arch/x86/kvm/x86.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6c9c81e82e65..baad160b592f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5407,11 +5407,21 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { + union fpregs_state *ustate = (union fpregs_state *) guest_xsave->region; + u64 user_xfeatures = vcpu->arch.guest_fpu.fpstate->user_xfeatures; + if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) return 0; - return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu, - guest_xsave->region, + /* + * In previous kernels, kvm_arch_vcpu_create() set the guest's fpstate + * based on what the host CPU supported. Recent kernels changed this + * and only accept ustate containing xfeatures that the guest CPU is + * capable of supporting. + */ + ustate->xsave.header.xfeatures &= user_xfeatures; + + return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu, ustate, kvm_caps.supported_xcr0, &vcpu->arch.pkru); } -- 2.30.2