> So we have recently discovered an overlooked interaction with VT-x. > Immediately before VMENTER and after VMEXIT, CR2 is live with the > *guest* CR2. Regardless of if the guest uses FRED or not, this is guest > state and SHOULD NOT be corrupted. Furthermore, host state MUST NOT leak > into the guest. > > NMIs are blocked on VMEXIT if the cause was an NMI, but not for other > reasons, so a NMI coming in during this window that then #PFs could > corrupt the guest CR2. I add a comment to vmx_vcpu_enter_exit() in https://lore.kernel.org/kvm/20231108183003.5981-1-xin3.li@xxxxxxxxx/T/#m29616c02befc04305085b1cbac64df916364626a for this. > > Intel is exploring ways to close this hole, but for schedule reasons, it > will not be available at the same time as the first implementation of > FRED. Therefore, as a workaround, it turns out that the FRED NMI stub > *will*, unfortunately, have to save and restore CR2 after all when (at > least) Intel KVM is in use. > > Note that this is airtight: it does add a performance penalty to the NMI > path (two read CR2 in the common case of no #PF), but there is no gap > during which a bad CR2 value could be introduced in the guest, no matter > in which sequence the events happen. > > In theory the performance penalty could be further reduced by > conditionalizing this on the NMI happening in the critical region in the > KVM code, but it seems to be pretty far from necessary to me. We should keep the following code in the FRED NMI handler, right? { ... this_cpu_write(nmi_cr2, read_cr2()); ... if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) write_cr2(this_cpu_read(nmi_cr2)); ... } > This obviously was an unfortunate oversight on our part, but the > workaround is simple and doesn't affect any non-NMI paths. > > > + > > + if (IS_ENABLED(CONFIG_SMP) && > arch_cpu_is_offline(smp_processor_id())) > > + return; > > + > > This is cut & paste from elsewhere in the NMI code, but I believe the > IS_ENABLED() is unnecessary (not to mention ugly): smp_processor_id() > should always return zero on UP, and arch_cpu_is_offline() reduces to > !(cpu == 0), so this is a statically true condition on UP. Ah, good point!