On Fri, Feb 28, 2020 at 11:01 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 28/02/20 19:42, Andy Lutomirski wrote: > > KVM overloads #PF to indicate two types of not-actually-page-fault > > events. Right now, the KVM guest code intercepts them by modifying > > the IDT and hooking the #PF vector. This makes the already fragile > > fault code even harder to understand, and it also pollutes call > > traces with async_page_fault and do_async_page_fault for normal page > > faults. > > > > Clean it up by moving the logic into do_page_fault() using a static > > branch. This gets rid of the platform trap_init override mechanism > > completely. > > > > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> > > Acked-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > Just one thing: > > > @@ -1505,6 +1506,25 @@ do_page_fault(struct pt_regs *regs, unsigned long hw_error_code, > > unsigned long address) > > { > > prefetchw(¤t->mm->mmap_sem); > > + /* > > + * KVM has two types of events that are, logically, interrupts, but > > + * are unfortunately delivered using the #PF vector. > > At least the not-present case isn't entirely an interrupt because it > must be delivered precisely. Regarding the page-ready case you're > right, it could be an interrupt. However, generally speaking this is not > a problem. Using something in memory rather than overloading the error > code was the mistake. > > > + * These events are > > + * "you just accessed valid memory, but the host doesn't have it right > > + * not, so I'll put you to sleep if you continue" and "that memory > > + * you tried to access earlier is available now." > > + * > > + * We are relying on the interrupted context being sane (valid > > + * RSP, relevant locks not held, etc.), which is fine as long as > > + * the the interrupted context had IF=1. > > This is not about IF=0/IF=1; the KVM code is careful about taking > spinlocks only with IRQs disabled, and async PF is not delivered if the > interrupted context had IF=0. The problem is that the memory location > is not reentrant if an NMI is delivered in the wrong window, as you hint > below. If an async PF is delivered with IF=0, then, unless something else clever happens to make it safe, we are toast. The x86 entry code cannot handle #PF (or most other entries) at arbitrary places. I'll improve the comment in v2.