On Mon, 30 Sept 2024 at 10:35, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: > > Sep 27 00:51:20 <redactedhostname>13 kernel: folio_wait_bit_common+0x13f/0x340 > Sep 27 00:51:20 <redactedhostname>13 kernel: folio_wait_writeback+0x2b/0x80 Gaah. Every single case you point to is that folio_wait_writeback() case. And this might be an old old annoyance. folio_wait_writeback() is insane. It does while (folio_test_writeback(folio)) { trace_folio_wait_writeback(folio, folio_mapping(folio)); folio_wait_bit(folio, PG_writeback); } and the reason that is insane is that PG_writeback isn't some kind of exclusive state. So folio_wait_bit() will return once somebody has ended writeback, but *new* writeback can easily have been started afterwards. So then we go back to wait... And even after it eventually returns (possibly after having waited for hundreds of other processes writing back that folio - imagine lots of other threads doing writes to it and 'fdatasync()' or whatever) the caller *still* can't actually assume that the writeback bit is clear, because somebody else might have started writeback again. Anyway, it's insane, but it's insane for a *reason*. We've tried to fix this before, long before it was a folio op. See commit c2407cf7d22d ("mm: make wait_on_page_writeback() wait for multiple pending writebacks"). IOW, this code is known-broken and might have extreme unfairness issues (although I had blissfully forgotten about it), because while the actual writeback *bit* itself is set and cleared atomically, the wakeup for the bit is asynchronous and can be delayed almost arbitrarily, so you can get basically spurious wakeups that were from a previous bit clear. So the "wait many times" is crazy, but it's sadly a necessary crazy as things are right now. Now, many callers hold the page lock while doing this, and in that case new writeback cases shouldn't happen, and so repeating the loop should be extremely limited. But "many" is not "all". For example, __filemap_fdatawait_range() very much doesn't hold the lock on the pages it waits for, so afaik this can cause that unfairness and starvation issue. That said, while every one of your traces are for that folio_wait_writeback(), the last one is for the truncate case, and that one *does* hold the page lock and so shouldn't see this potential unfairness issue. So the code here is questionable, and might cause some issues, but the starvation of folio_wait_writeback() can't explain _all_ the cases you see. Linus