On (24/12/12 00:37), Theodore Ts'o wrote: > On Thu, Dec 12, 2024 at 12:58:26PM +0900, Sergey Senozhatsky wrote: > > Hi, > > > > We've got two reports [1] [2] (could be the same person) which > > suggest that ext4 may change page content while the page is under > > write(). The particular problem here the case when ext4 is on > > the zram device. zram compresses every page written to it, so if > > the page content can be modified concurrently with zram's compression > > then we can't really use zram with ext4. > > > > Can you take a look please? > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=219548 > > [2] https://lore.kernel.org/linux-kernel/20241129115735.136033-1-baicaiaichibaicai@xxxxxxxxx > > The link in [2] is a bit busted, since the message in question wasn't > cc'ed to LKML, but rather to mm-commits. But dropping "/linux-kernel" > allows the link to work, and what's interesting is this message from > that thread: My bad. > https://lore.kernel.org/all/20241202060632.139067-1-baicaiaichibaicai@xxxxxxxxx/ Let me Cc Yu Huabing on this: > The blocks which are gtting modified while a write is in flight are > ext4 metadata blocks, which are in the buffer cache. Ext4 is > modifying those blocks via bh->b_data, and ext4 isn't issuing the > write; those are happenig via the buffer cache's writeback functions. > > Hmmm.... was the user using an ext4 file system with the journal > disabled, by any chance? If ext4 is using the journal (which is the > common case), metadata blocks only get modified via jbd2 journal > functions, and a blocks only get modified when they are part of a jbd2 > transaction --- and while the transaction is active, the buffer cache > writeback is disabled. It's only after the transaction is committed > that are dirty blocks associated with that transaction are allowed to > be written back. So I *think* the only way we could run into problems > is ext4's jbd2 journalling is disabled. > > More generally, any file system which uses the buffer cache, and > doesn't use jbd2 to control when writeback happens, I think is going > to be at risk with a block device which requires stable writes. The > only way to fix this, really, is to have the buffer cache code copy > the data to a bounce buffer, and then issue the write from the bounce > buffer.