> From: Tian, Kevin > Sent: Thursday, December 30, 2021 3:05 PM > > the new change is like below. > > static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) > { > /* > * Save xfd_err to guest_fpu before interrupt is enabled, so the > * guest value is not clobbered by the host activity before the guest > * has chance to consume it. > * > * Since trapping #NM is started when xfd write interception is > * disabled, using this flag to guard the saving operation. This > * implies no-op for a non-xfd #NM due to L1 interception. > * > * Queuing exception is done in vmx_handle_exit. > */ > if (vcpu->arch.xfd_no_write_intercept) > rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > } > > in the final series it will first check vcpu->arch.guest_fpu.fpstate->xfd > before the disable interception patch is applied and then becomes > the above form, similar to your suggestion on > vmx_update_exception_bitmap(). > > whether to check msr_bitmap vs. an extra flag is an orthogonal open. > > Then: > > handle_exception_nmi(struct kvm_vcpu *vcpu) > { > ... > if (is_machine_check(intr_info) || is_nmi(intr_info)) > return 1; /* handled by handle_exception_nmi_irqoff() */ > > /* > * Queue the exception here instead of in handle_nm_fault_irqoff(). > * This ensures the nested_vmx check is not skipped so vmexit can > * be reflected to L1 (when it intercepts #NM) before reaching this > * point. > */ > if (is_nm_fault(intr_info)) { > kvm_queue_exception(vcpu, NM_VECTOR); > return 1; > } > > ... > } > > Then regarding to test non-AMX nested #NM usage, it might be difficult > to trigger it from modern OS. As commented by Linux #NM handler, it's > expected only for XFD or math emulation when fpu is missing. So we plan > to run a selftest in L1 which sets CR0.TS and then touch fpu registers. and > for L1 kernel we will run two binaries with one trapping #NM and the other > not. > We have verified this scenario and didn't find problem. Basically the selftest is like below: guest_code() { cr0 = read_cr0(); cr0 |= X86_CR0_TS; write_cr0(cr0); asm volatile("fnop"); } guest_nm_handler() { cr0 = read_cr0(); cr0 &= ~X86_CR0_TS; write_cr0(cr0); } We run the selftest in L1 to create a nested scenario. When L1 intercepts #NM: (L2) fnop (L0) #NM vmexit (L0) reflect a virtual vmexit (reason #NM) to L1 (L1) #NM vmexit (L1) queue #NM exception to L2 (L2) guest_nm_handler() (L2) fnop (succeed) When L1 doesn't intercept #NM: (L2) fnop (L0) #NM vmexit (L0) queue #NM exception to L2 (L2) guest_nm_handler() (L2) fnop (succeed) Please suggest if any more test is necessary here. Thanks Kevin