On Thu, Dec 12, 2024 at 12:58:26PM +0900, Sergey Senozhatsky wrote: > Hi, > > We've got two reports [1] [2] (could be the same person) which > suggest that ext4 may change page content while the page is under > write(). The particular problem here the case when ext4 is on > the zram device. zram compresses every page written to it, so if > the page content can be modified concurrently with zram's compression > then we can't really use zram with ext4. > > Can you take a look please? > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=219548 > [2] https://lore.kernel.org/linux-kernel/20241129115735.136033-1-baicaiaichibaicai@xxxxxxxxx The link in [2] is a bit busted, since the message in question wasn't cc'ed to LKML, but rather to mm-commits. But dropping "/linux-kernel" allows the link to work, and what's interesting is this message from that thread: https://lore.kernel.org/all/20241202060632.139067-1-baicaiaichibaicai@xxxxxxxxx/ The blocks which are getting modified while a write is in flight are ext4 metadata blocks, which are in the buffer cache. Ext4 is modifying those blocks via bh->b_data, and ext4 isn't issuing the write; those are happenig via the buffer cache's writeback functions. Hmmm.... was the user using an ext4 file system with the journal disabled, by any chance? If ext4 is using the journal (which is the common case), metadata blocks only get modified via jbd2 journal functions, and a blocks only get modified when they are part of a jbd2 transaction --- and while the transaction is active, the buffer cache writeback is disabled. It's only after the transaction is committed that are dirty blocks associated with that transaction are allowed to be written back. So I *think* the only way we could run into problems is ext4's jbd2 journalling is disabled. More generally, any file system which uses the buffer cache, and doesn't use jbd2 to control when writeback happens, I think is going to be at risk with a block device which requires stable writes. The only way to fix this, really, is to have the buffer cache code copy the data to a bounce buffer, and then issue the write from the bounce buffer. - Ted