Re: [bugzilla:219548] the kernel crashes when storing an EXT4 file system in a ZRAM device

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 12 Dec 2024 00:37:39 -0500

On Thu, Dec 12, 2024 at 12:58:26PM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> We've got two reports [1] [2] (could be the same person) which
> suggest that ext4 may change page content while the page is under
> write().  The particular problem here the case when ext4 is on
> the zram device.  zram compresses every page written to it, so if
> the page content can be modified concurrently with zram's compression
> then we can't really use zram with ext4.
> 
> Can you take a look please?
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=219548
> [2] https://lore.kernel.org/linux-kernel/20241129115735.136033-1-baicaiaichibaicai@xxxxxxxxx

The link in [2] is a bit busted, since the message in question wasn't
cc'ed to LKML, but rather to mm-commits.  But dropping "/linux-kernel"
allows the link to work, and what's interesting is this message from
that thread:

https://lore.kernel.org/all/20241202060632.139067-1-baicaiaichibaicai@xxxxxxxxx/

The blocks which are getting modified while a write is in flight are
ext4 metadata blocks, which are in the buffer cache.  Ext4 is
modifying those blocks via bh->b_data, and ext4 isn't issuing the
write; those are happenig via the buffer cache's writeback functions.

Hmmm.... was the user using an ext4 file system with the journal
disabled, by any chance?  If ext4 is using the journal (which is the
common case), metadata blocks only get modified via jbd2 journal
functions, and a blocks only get modified when they are part of a jbd2
transaction --- and while the transaction is active, the buffer cache
writeback is disabled.  It's only after the transaction is committed
that are dirty blocks associated with that transaction are allowed to
be written back.  So I *think* the only way we could run into problems
is ext4's jbd2 journalling is disabled.

More generally, any file system which uses the buffer cache, and
doesn't use jbd2 to control when writeback happens, I think is going
to be at risk with a block device which requires stable writes.  The
only way to fix this, really, is to have the buffer cache code copy
the data to a bounce buffer, and then issue the write from the bounce
buffer.

						- Ted