On Fri, Oct 09, 2015 at 12:01:09AM -0400, Theodore Ts'o wrote: > If there is a error while copying data from userspace into the page > cache during a write(2) system call, in data=journal mode, in > ext4_journalled_write_end() were using page_zero_new_buffers() from > fs/buffer.c. Unfortunately, this sets the buffer dirty flag, which is > no good if journalling is enabled. This is a long-standing bug that > goes back for years and years in ext3, but a combination of (a) > data=journal not being very common, (b) in many case it only results > in a warning message. and (c) only very rarely causes the kernel hang, > means that we only really noticed this as a problem when commit > 998ef75ddb caused this failure to happen frequently enough to cause > generic/208 to fail when run in data=journal mode. > > The fix is to have our own version of this function that doesn't call > mark_dirty_buffer(), since we will end up calling > ext4_handle_dirty_metadata() on the buffer head(s) in questions very > shortly afterwards in ext4_journalled_write_end(). > > Thanks to Dave Hansen and Linus Torvalds for helping to identify the > root cause of the problem. > Hello there, a blast from the past. I see this has landed in b90197b655185a11640cce3a0a0bc5d8291b8ad2 I came here from looking at a pwrite vs will-it-scale and noticing that pre-faulting eats CPU (over 5% on my Sapphire Rapids) due to SMAP trips. It used to be that pre-faulting was avoided specifically for that reason, but it got temporarily reverted due to bugs in ext4, to quote Linus (see 00a3d660cbac05af34cca149cb80fb611e916935): > The commit itself does not appear to be buggy per se, but it is exposing > a bug in ext4 (and Ted thinks ext3 too, but we solved that by getting > rid of it). It's too late in the release cycle to really worry about > this, even if Dave Hansen has a patch that may actually fix the > underlying ext4 problem. We can (and should) revisit this for the next > release. Given your patch landing I take it this is expected to be fixed now? Sounds like nobody bothered to revert the revert. Not the end of the world, but it is few % left on the table for (hopefully) no reason. ofc testing will be needed, but that's what -next is for thanks,