On Tue, Feb 11, 2020 at 07:48:12PM -0800, Andy Lutomirski wrote: > > > > On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote: > > > > > > > * Putting some NMI-load on the guest will make it crash usually > > within a minute > > Suppose you do CPUID or some MMIO and get #VC. You fill in the GHCB to > ask for help. Some time between when you start filling it out and when > you do VMGEXIT, you get NMI. If the NMI does its own GHCB access [0], > it will clobber the outer #VC’a state, resulting in a failure when > VMGEXIT happens. There’s a related failure mode if the NMI is after > the VMGEXIT but before the result is read. > > I suspect you can fix this by saving the GHCB at the beginning of > do_nmi and restoring it at the end. This has the major caveat that it > will not work if do_nmi comes from user mode and schedules, but I > don’t believe this can happen. > > [0] Due to the NMI_COMPLETE catastrophe, there is a 100% chance that > this happens. Very true, thank you! You probably saved me a few hours of debugging this further :) I will implement better handling for nested #VC exceptions, which hopefully solves the NMI crashes. Thanks again, Joerg