Re: [PATCH] ext4: use private version of page_zero_new_buffers() for data=journal mode

Mateusz Guzik <mjguzik@xxxxxxxxx> · Sun, 26 Jan 2025 18:01:55 +0100

On Fri, Oct 09, 2015 at 12:01:09AM -0400, Theodore Ts'o wrote:
> If there is a error while copying data from userspace into the page
> cache during a write(2) system call, in data=journal mode, in
> ext4_journalled_write_end() were using page_zero_new_buffers() from
> fs/buffer.c.  Unfortunately, this sets the buffer dirty flag, which is
> no good if journalling is enabled.  This is a long-standing bug that
> goes back for years and years in ext3, but a combination of (a)
> data=journal not being very common, (b) in many case it only results
> in a warning message. and (c) only very rarely causes the kernel hang,
> means that we only really noticed this as a problem when commit
> 998ef75ddb caused this failure to happen frequently enough to cause
> generic/208 to fail when run in data=journal mode.
> 
> The fix is to have our own version of this function that doesn't call
> mark_dirty_buffer(), since we will end up calling
> ext4_handle_dirty_metadata() on the buffer head(s) in questions very
> shortly afterwards in ext4_journalled_write_end().
> 
> Thanks to Dave Hansen and Linus Torvalds for helping to identify the
> root cause of the problem.
> 

Hello there, a blast from the past.

I see this has landed in b90197b655185a11640cce3a0a0bc5d8291b8ad2

I came here from looking at a pwrite vs will-it-scale and noticing that
pre-faulting eats CPU (over 5% on my Sapphire Rapids) due to SMAP trips.

It used to be that pre-faulting was avoided specifically for that
reason, but it got temporarily reverted due to bugs in ext4, to quote
Linus (see 00a3d660cbac05af34cca149cb80fb611e916935):

>    The commit itself does not appear to be buggy per se, but it is exposing
>    a bug in ext4 (and Ted thinks ext3 too, but we solved that by getting
>    rid of it).  It's too late in the release cycle to really worry about
>    this, even if Dave Hansen has a patch that may actually fix the
>    underlying ext4 problem.  We can (and should) revisit this for the next
>    release.

Given your patch landing I take it this is expected to be fixed now?

Sounds like nobody bothered to revert the revert. Not the end of the
world, but it is few % left on the table for (hopefully) no reason. ofc
testing will be needed, but that's what -next is for

thanks,