On Fri, 8 Sept 2023 at 15:50, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > which actually makes me worry about the nested irq case, because this > > would *not* be ok: > > > > allow_resched(); > > -> irq happens > > -> *nested* irq happens > > <- nested irq return (and preemption) > > > > ie the allow_resched() needs to still honor the irq count, and a > > nested irq return obviously must not cause any preemption. > > I think we killed nested interrupts a fair number of years ago, but I'll > recheck -- but not today, sleep is imminent. I don't think it has to be an interrupt. I think the TIF_ALLOW_RESCHED thing needs to look out for any nested exception (ie only ever trigger if it's returning to the kernel "task" stack). Because I could easily see us wanting to do "I'm going a big user copy, it should do TIF_ALLOW_RESCHED, and I don't have preemption on", and then instead of that first "irq happens", you have "page fault happens" instead. And inside that page fault handling you may well have critical sections (like a spinlock) that is fine - but the fact that the "process context" had TIF_ALLOW_RESCHED most certainly does *not* mean that the page fault handler can reschedule. Maybe it already does. As mentioned, I lost sight of the patch series, even though I saw it originally (and liked it - only realizing on your complaint that it migth be more dangerous than I thought). Basically, the "allow resched" should be a marker for a single context level only. Kind of like a register state bit that gets saved on the exception stack. Not a "anything happening within this process is now preemptible". I'm hoping Ankur will just pipe in and say "of course I already implemented it that way, see XYZ". Linus