On Wed, 2 Mar 2022, Michal Hocko wrote: > > I might be really missing something but I really do not see how is this > any different from the page allocator path which only does cond_resched > as well (well, except for throttling but that might just not trigger). > Or other paths which just do cond_resched while waiting for a progress > somewhere else. > > Not that I like this situation but !PREEMPT kernel with RT priority > tasks is rather limited and full of potential priblems IMHO. As I said in previous mail, I have really not given this as much thought this time as I did in the 2018 mail thread linked there; but have seen that it behaves more badly than I had imagined, in any preemptive kernel - no need for RT. We just don't have the stats to show when this code here spins waiting on code elsewhere that is sleeping. I think the difference from most cond_resched() places is that swapin is trying to collect together several factors with minimal locking, and we should have added preempt_disable()s when preemption was invented. But it's only swap so we didn't notice. Hugh