Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 30 Sep 2024 13:12:37 -0700

On Mon, 30 Sept 2024 at 12:25, Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:
>
> I’m being told that I’m somewhat of a truffle pig for dirty code … how long ago does “old old” refer to, btw?

It's basically been that way forever. The code has changed many times,
but we've basically always had that "wait on bit will wait not until
the next wakeup, but until it actually sees the bit being clear".

And by "always" I mean "going back at least to before the git tree". I
didn't search further. It's not new.

The only reason I pointed at that (relatively recent) commit from 2021
is that when we rewrote the page bit waiting logic (for some unrelated
horrendous scalability issues with tens of thousands of pages on wait
queues), the rewritten code _tried_ to not do it, and instead go "we
were woken up by a bit clear op, so now we've waited enough".

And that then caused problems as explained in that commit c2407cf7d22d
("mm: make wait_on_page_writeback() wait for multiple pending
writebacks") because the wakeups aren't atomic wrt the actual bit
setting/clearing/testing.

IOW - that 2021 commit didn't _introduce_ the issue, it just went back
to the horrendous behavior that we've always had, and temporarily
tried to avoid.

Note that "horrendous behavior" is really "you probably can't hit it
under any normal load". So it's not like it's a problem in practice.

Except your load clearly triggers *something*. And maybe this is part of it.

                 Linus