On Mon, 30 Sept 2024 at 12:25, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: > > I’m being told that I’m somewhat of a truffle pig for dirty code … how long ago does “old old” refer to, btw? It's basically been that way forever. The code has changed many times, but we've basically always had that "wait on bit will wait not until the next wakeup, but until it actually sees the bit being clear". And by "always" I mean "going back at least to before the git tree". I didn't search further. It's not new. The only reason I pointed at that (relatively recent) commit from 2021 is that when we rewrote the page bit waiting logic (for some unrelated horrendous scalability issues with tens of thousands of pages on wait queues), the rewritten code _tried_ to not do it, and instead go "we were woken up by a bit clear op, so now we've waited enough". And that then caused problems as explained in that commit c2407cf7d22d ("mm: make wait_on_page_writeback() wait for multiple pending writebacks") because the wakeups aren't atomic wrt the actual bit setting/clearing/testing. IOW - that 2021 commit didn't _introduce_ the issue, it just went back to the horrendous behavior that we've always had, and temporarily tried to avoid. Note that "horrendous behavior" is really "you probably can't hit it under any normal load". So it's not like it's a problem in practice. Except your load clearly triggers *something*. And maybe this is part of it. Linus