On Thu, Jun 04, 2020 at 12:00:33PM -0700, Jim Mattson wrote: > On Thu, Jun 4, 2020 at 11:47 AM Sean Christopherson > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > On Wed, Jun 03, 2020 at 01:18:31PM -0700, Jim Mattson wrote: > > > On Tue, Jun 2, 2020 at 7:24 PM Sean Christopherson > > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > As an alternative to storing the last run/attempted CPU, what about moving > > > > the "bad VM-Exit" detection into handle_exit_irqoff, or maybe a new hook > > > > that is called after IRQs are enabled but before preemption is enabled, e.g. > > > > detect_bad_exit or something? All of the paths in patch 4/4 can easily be > > > > moved out of handle_exit. VMX would require a little bit of refacotring for > > > > it's "no handler" check, but that should be minor. > > > > > > Given the alternatives, I'm willing to compromise my principles wrt > > > emulation_required. :-) I'll send out v4 soon. > > > > What do you dislike about the alternative approach? > > Mainly, I wanted to stash this in a common location so that I could > print it out in our local version of dump_vmcs(). Ideally, we'd like > to be able to identify the bad part(s) just from the kernel logs. But this would also move dump_vmcs() to before preemption is enabled, i.e. your version could read the CPU directly. And actually, if we're talking about ferreting out hardware issues, you really do want this happening before preemption is enabled so that the VMCS dump comes from the failing CPU. If the vCPU is migrated, the VMCS will be dumped after a VMCLEAR->VMPTRLD, i.e. will be written to memory and pulled back into the VMCS cache on a different CPU, and will also have been written to by the new CPU to update host state. Odds are that wouldn't affect the dump in a meaningful way, but never say never. Tangentially related, what about adding an option to do VMCLEAR at the end of dump_vmcs(), followed by a dump of raw memory? It'd be useless for debugging software issues, but might be potentially useful/interesting for triaging hardware problems. > That, and I wouldn't have been as comfortable with the refactoring > without a lot more testing.