On 28/02/20 19:42, Andy Lutomirski wrote: > KVM overloads #PF to indicate two types of not-actually-page-fault > events. Right now, the KVM guest code intercepts them by modifying > the IDT and hooking the #PF vector. This makes the already fragile > fault code even harder to understand, and it also pollutes call > traces with async_page_fault and do_async_page_fault for normal page > faults. > > Clean it up by moving the logic into do_page_fault() using a static > branch. This gets rid of the platform trap_init override mechanism > completely. > > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> Acked-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> Just one thing: > @@ -1505,6 +1506,25 @@ do_page_fault(struct pt_regs *regs, unsigned long hw_error_code, > unsigned long address) > { > prefetchw(¤t->mm->mmap_sem); > + /* > + * KVM has two types of events that are, logically, interrupts, but > + * are unfortunately delivered using the #PF vector. At least the not-present case isn't entirely an interrupt because it must be delivered precisely. Regarding the page-ready case you're right, it could be an interrupt. However, generally speaking this is not a problem. Using something in memory rather than overloading the error code was the mistake. > + * These events are > + * "you just accessed valid memory, but the host doesn't have it right > + * not, so I'll put you to sleep if you continue" and "that memory > + * you tried to access earlier is available now." > + * > + * We are relying on the interrupted context being sane (valid > + * RSP, relevant locks not held, etc.), which is fine as long as > + * the the interrupted context had IF=1. This is not about IF=0/IF=1; the KVM code is careful about taking spinlocks only with IRQs disabled, and async PF is not delivered if the interrupted context had IF=0. The problem is that the memory location is not reentrant if an NMI is delivered in the wrong window, as you hint below. Paolo > We are also relying on > + * the KVM async pf type field and CR2 being read consistently > + * instead of getting values from real and async page faults > + * mixed up. > + * > + * Fingers crossed. > + */ > + if (kvm_handle_async_pf(regs, hw_error_code, address)) > + return; > + > trace_page_fault_entries(regs, hw_error_code, address); > > if (unlikely(kmmio_fault(regs, address))) >