The patch titled Subject: filemap: move prefaulting out of hot write path has been added to the -mm mm-hotfixes-unstable branch. Its filename is filemap-move-prefaulting-out-of-hot-write-path.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/filemap-move-prefaulting-out-of-hot-write-path.patch This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Subject: filemap: move prefaulting out of hot write path Date: Fri, 28 Feb 2025 12:37:22 -0800 There is a generic anti-pattern that shows up in the VFS and several filesystems where the hot write paths touch userspace twice when they could get away with doing it once. Dave Chinner suggested that they should all be fixed up[1]. I agree[2]. But, the series to do that fixup spans a bunch of filesystems and a lot of people. This patch fixes common code that absolutely everyone uses. It has measurable performance benefits[3]. I think this patch can go in and not be held up by the others. I will post them separately to their separate maintainers for consideration. But, honestly, I'm not going to lose any sleep if the maintainers don't pick those up. 1. https://lore.kernel.org/all/Z5f-x278Z3wTIugL@xxxxxxxxxxxxxxxxxxx/ 2. https://lore.kernel.org/all/20250129181749.C229F6F3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ 3. https://lore.kernel.org/all/202502121529.d62a409e-lkp@xxxxxxxxx/ This patch: There is a bit of a sordid history here. I originally wrote 998ef75ddb57 ("fs: do not prefault sys_write() user buffer pages") to fix a performance issue that showed up on early SMAP hardware. But that was reverted with 00a3d660cbac because it exposed an underlying filesystem bug. This is a reimplementation of the original commit along with some simplification and comment improvements. The basic problem is that the generic write path has two userspace accesses: one to prefault the write source buffer and then another to perform the actual write. On x86, this means an extra STAC/CLAC pair. These are relatively expensive instructions because they function as barriers. Keep the prefaulting behavior but move it into the slow path that gets run when the write did not make any progress. This avoids livelocks that can happen when the write's source and destination target the same folio. Contrary to the existing comments, the fault-in does not prevent deadlocks. That's accomplished by using an "atomic" usercopy that disables page faults. The end result is that the generic write fast path now touches userspace once instead of twice. 0day has shown some improvements on a couple of microbenchmarks: https://lore.kernel.org/all/202502121529.d62a409e-lkp@xxxxxxxxx/ Link: https://lkml.kernel.org/r/20250228203722.CAEB63AC@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Link: https://lore.kernel.org/all/yxyuijjfd6yknryji2q64j3keq2ygw6ca6fs5jwyolklzvo45s@4u63qqqyosy2/ Cc: Ted Ts'o <tytso@xxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Mateusz Guzik <mjguzik@xxxxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/filemap.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) --- a/mm/filemap.c~filemap-move-prefaulting-out-of-hot-write-path +++ a/mm/filemap.c @@ -4170,17 +4170,6 @@ retry: bytes = min(chunk - offset, bytes); balance_dirty_pages_ratelimited(mapping); - /* - * Bring in the user page that we will copy from _first_. - * Otherwise there's a nasty deadlock on copying from the - * same page as we're writing to, without it being marked - * up-to-date. - */ - if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) { - status = -EFAULT; - break; - } - if (fatal_signal_pending(current)) { status = -EINTR; break; @@ -4198,6 +4187,12 @@ retry: if (mapping_writably_mapped(mapping)) flush_dcache_folio(folio); + /* + * Faults here on mmap()s can recurse into arbitrary + * filesystem code. Lots of locks are held that can + * deadlock. Use an atomic copy to avoid deadlocking + * in page fault handling. + */ copied = copy_folio_from_iter_atomic(folio, offset, bytes, i); flush_dcache_folio(folio); @@ -4223,6 +4218,16 @@ retry: bytes = copied; goto retry; } + + /* + * 'folio' is now unlocked and faults on it can be + * handled. Ensure forward progress by trying to + * fault it in now. + */ + if (fault_in_iov_iter_readable(i, bytes) == bytes) { + status = -EFAULT; + break; + } } else { pos += status; written += status; _ Patches currently in -mm which might be from dave.hansen@xxxxxxxxxxxxxxx are filemap-move-prefaulting-out-of-hot-write-path.patch