On Sat, Mar 7, 2020 at 7:52 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > Andy Lutomirski <luto@xxxxxxxxxx> writes: > > On Sat, Mar 7, 2020 at 2:01 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >> > What’s the local_irq_disable() here for? I would believe a > >> > lockdep_assert_irqs_disabled() somewhere in here would make sense. > >> > (Yes, I see you copied this from the old code. It’s still nonsense.) > >> > >> native_safe_halt() does: > >> > >> STI > >> HLT > >> > >> So the irq disable is required as the loop should exit with interrupts > >> disabled. > > > > Oops, should have looked at what native_safe_halt() does. > > > >> > >> > I also find it truly bizarre that hlt actually works in this context. > >> > Does KVM in fact wake a HLTed guest that HLTed with IRQs off when a > >> > pending async pf is satisfied? This would make sense if the wake > >> > event were an interrupt, but it’s not according to Paolo. > >> > >> See above. safe halt enables interrupts, which means IF == 1 and the > >> host sanity check for IF == 1 is satisfied. > >> > >> In fact, if e.g. some regular interrupt arrives before the page becomes > >> available and the guest entered the halt loop because the fault happened > >> inside a RCU read side critical section with preemption enabled, then > >> the interrupt might wake up another task, set need resched and this > >> other task can run. > > > > Now I'm confused again. Your patch is very careful not to schedule if > > we're in an RCU read-side critical section, but the regular preemption > > code (preempt_schedule_irq, etc) seems to be willing to schedule > > inside an RCU read-side critical section. Why is the latter okay but > > not the async pf case? > > Preemption is fine, but voluntary schedule not. voluntary schedule might > end up in idle if this is the last runnable task. I guess something horrible happens if we try to go idle while in an RCU read-side critical section. Like perhaps it's considered a grace period. Hmm. > > > Ignoring that, this still seems racy: > > > > STI > > nested #PF telling us to wake up > > #PF returns > > HLT > > You will say Ooops, should have looked .... when I tell you that the > above cannot happen. From the SDM: > > If IF = 0, maskable hardware interrupts remain inhibited on the > instruction boundary following an execution of STI. > > Otherwise safe_halt would not work at all :) Ooops, should have looked. :) > > Sadly, wrmsr to turn off async pf will inject wakeups even if IF == 0. > > WHAT? That's fundamentally broken. Can you point me to the code in > question? I think Paolo said so in a different thread, but I can't Let me see if I can find it: kvm_pv_enable_async_pf() kvm_clear_async_pf_completion_queue() but that doesn't actually seem to send #PF. So maybe I'm wrong. I will admit that, even after reading the host code a few times, I'm also not convinced that wakeups don't get swallowed on occasion if they would have been delivered at times when it's illegal.