On Mon, 28 Feb 2022 20:07:33 -0800 (PST) Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > --- a/mm/swap_state.c > > > +++ b/mm/swap_state.c > > > @@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > > > * __read_swap_cache_async(), which has set SWAP_HAS_CACHE > > > * in swap_map, but not yet added its page to swap cache. > > > */ > > > - cond_resched(); > > > + schedule_timeout_uninterruptible(1); > > > } > > > > > > /* > > > > Sigh. I guess yes, we should do this, at least in a short-term, > > backportable-to-stable way. > > > > But busy-waiting while hoping that someone else will save us isn't an > > attractive design. Hugh, have you ever thought about something more > > deterministic in there? > > Not something more deterministic, no: I think that would entail > heavier locking, perhaps slowing down hotter paths, just to avoid > this swap oddity. > > This loop was written long before there was a preemptive kernel: > it was appropriate then, and almost never needed more than one retry > to complete; but preemption changed the story without us realizing. > > Sigh here too. I commend the thread on it from July 2018: > https://lore.kernel.org/linux-mm/2018072514403228778860@xxxxxxxxxxxx/ > > There the 4.9-stable user proposed preempt_disable(), I agreed but > found the patch provided insufficient, and offered another 4.9 patch > further down the thread. Your preference at the time was msleep(1). > > I was working on a similar patch for 4.18, but have not completed it > yet ;) and don't remember how satisfied or not I was with that one; > and wonder if I'm any more likely to get it finished by 2026. It's > clear that I put much more thought into it back then than just now. > > Maybe someone else would have a go: my 4.9 patch in that thread > shows most of it, but might need a lot of work to update to 5.17. > > And it also gathered some temporary debug stats on how often this > happens: I'm not conscious of using RT at all, but was disturbed to see > how long an ordinary preemptive kernel was sometimes spinning there. > So I think I agree with you more than Michal on that: RT just makes > the bad behaviour more obvious. Thanks as always. Using msleep() seems pretty pointless so I plan to go ahead with patch as-is, with a cc:stable. None of it is pretty, but it's better than what we have now, yes?