On Sat, Mar 7, 2020 at 7:10 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > On Sat, Mar 7, 2020 at 2:01 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > > Andy Lutomirski <luto@xxxxxxxxxx> writes: > Now I'm confused again. Your patch is very careful not to schedule if > we're in an RCU read-side critical section, but the regular preemption > code (preempt_schedule_irq, etc) seems to be willing to schedule > inside an RCU read-side critical section. Why is the latter okay but > not the async pf case? I read more docs. I guess the relevant situation is CONFIG_PREEMPT_CPU, in which case it is legal to preempt an RCU read-side critical section and obviously legal to put the whole CPU to sleep, but it's illegal to explicitly block in an RCU read-side critical section. So I have a question for Paul: is it, in fact, entirely illegal to block or merely illegal to block for an excessively long time, e.g. waiting for user space or network traffic? In this situation, we cannot make progress until the host says we can, so we are, in effect, blocking until the host tells us to stop blocking. Regardless, I agree that turning IRQs on is reasonable, and allowing those IRQs to preempt us is reasonable. As it stands in your patch, the situation is rather odd: we'll run another task if that task *preempts* us (e.g. we block long enough to run out of our time slice), but we won't run another task if we aren't preempted. This seems bizarre. > > Ignoring that, this still seems racy: > > STI > nested #PF telling us to wake up > #PF returns > HLT > > doesn't this result in putting the CPU asleep for no good reason until > the next interrupt hits? I think this issue still stands and is actually a fairly easy race to hit. STI IRQ happens and we get preempted another task runs and gets the #PF "async pf wakeup" event reschedule, back to original task HLT The only particularly unusual thing here is that an IRQ (timer or otherwise) needs to be queued up between when the #PF "async pf sleep" event happens and STI so that it gets delivered before HLT. ISTM the way to fully address this is to make the logic something like: if (preemptible) { actually go to sleep. do not HLT. Do this even in an RCU read-side critical section. } else { /* ok, we have to wait, but it's still legal to handle IRQs. */ if (choice A) { keep IRQs off. Spin until we wake up. } else { while (still need to sleep) { HLT (with IRQs off!) local_irq_enable(); /* if an interrupt was queued, handle it. */ local_irq_disable(); } }