Hi Thomas and Paolo, Thanks for your thoughts and suggestions. After reading the emails and looking at the code, we'd like to explain our thoughts of AMX KVM support based on latest kernel and the code from git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu-kvm AMX support based on existing design concepts One of our objectives is to have a simple and clean KVM implementation by utilizing the new dynamic extended-features handling in the FPU core. Dynamic reallocation and "lazy passthrough" The new code allows us to implement "lazy passthrough" of the XFD MSRs by coupling a buffer reallocation request, which is indirectly made by vcpus (VM exit). With "lazy passthrough" of the XFD MSR, we can avoid unnecessary save/restore of the MSR and allocation of the extended features until the guest really requires and is allowed to use. Until that point, the XFD MSR is virtual, and thus we do not need to save/restore the actual MSR at VM entry/exit time. And the vcpu does not have an extended state until that point. Once the guest starts using the XFD feature (e.g. AMX) and it is permitted to use it, we allow the guest to directly modify the MSR (passthrough) to avoid (potentially frequent) VM exits. Triggering of a reallocation request and error handling First, we want to avoid weird guest failures at runtime due to (more likely) permission failures of a reallocation request, checking the permissions of the vcpu (for the extend features) at kvm_vcpu_ioctl_set_cpuid2() time, when QEMU wants to advertise the extended features (e.g. AMX) for the first time. We have no idea at vcpu_create() time whether QEMU wants to enable AMX or not at that time. If kvm_vcpu_ioctl_set_cpuid2() succeeds, then there is no need to further check permission in reallocation path. Upon detection (interception) of an attempt by a vcpu to write to XCR0 (XSETBV) and XFD (WRMSR), we check if the write is valid, and we start passthrough of the XFD MSRs if the dynamic feature[i] meets the condition XCR0[i]=1 && XFD[i]=0. And we make a reallocation request to the FPU core. We simplify the KVM implementation by assuming that the reallocation request was successful when the vcpu comes back to KVM. For such VM exit handling that requires a buffer-reallocation request, we don't resume the guest immediately. Instead, we go back to the userspace, to rely on the userspace VMM (e.g. QEMU) for handling error cases. The actual reallocation happens when control is transferred from KVM to the kernel (FPU core). If no error, QEMU will come back to KVM by repeating vcpu_ioctl_run(). Potential failures there are due to lack of memory. But this would not be interesting cases; the host should have more resource problems at that time if that is the case. Additional KVM-specific or and virtualization requirements KVM needs to virtualize the XFD features, and we have additional requirements. XFD reset value The XFD reset value needs to be 0. KVM-specific XFD handling in XSAVES/XRSTORS Once we start passthrough the XFD MSR, we need to save/restore them at VM exit/entry time. If we immediately resume the guest without enabling interrupts/preemptions (exit fast-path), we have no issues. We don't need to save the MSR. The question is how the host XFD MSR is restored while control is in KVM. The XSAVE(S) instruction saves the (guest) state component[x] as 0 or doesn't save when XFD[x] != 0. Accordingly, XRSTOR(S) cannot restore that (guest state). And it is possible that XFD != 0 and the guest is using extended feature at VM exit; we can check the XINUSE state-component bitmap by XGETBV(1). By adding more meaning to the existing field: fpstate->in_use, it can be useful for KVM to set the XINUSE value. The usual VM exit handling in KVM, however, is done with interrupt/preemption enabled. If a guest has a non-zero XFD and AMX is in use at VM exit, the host and KVM need to maintain the guest state. There are two cases where the host and KVM may lose the state: a). KVM is scheduled out and kernel context switch does XSAVES, b). KVM is interrupted and the softirq path calls kernel_fpu_begin_mask(), which may execute XSAVES. One crude way (Option 1) would be clear XFD temporarily at VM exit time if the extended feature (AMX) is in use (XINUSE). It also causes unnecessary overhead because interrupt/preemption may not always happen. Given the new unified handling of the XFD state management and guest awareness in the FPU core, we think it might be better to defer this to the host (Option 2): a). Before the host kernel executes XSAVES, it clears XFD by checking if this is a KVM guest fpu and if guest AMX is in use (XINUSE). KVM can convey the condition by using fpstate->is_guest and fpstate->in_use, for example. We need to add more meaning (and code changes) to those fields. b). Same for XRSTORS. One of potential drawbacks of the Option 2 might be additional checks in the host, although we can minimize the impact by having CONFIG_KVM_TBD. We believe that the case "XFD != 0 and XINUSE != 0" should be very infrequent. Propagation of reallocation errors As noted above, a reallocation request can fail, and we need to propagate the error code to the userspace (e.g. QEMU) so that it can handle the failure properly. Since we do not want to terminate the guest after running due to permission errors ("weird failure"), we think we should check the permission at set_cpuid2 time, return failure if no permission. Looking forward to your comments. Thanks, Jing