+ mm-add-as_writeback_indeterminate-mapping-flag.patch added to mm-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm: add AS_WRITEBACK_INDETERMINATE mapping flag
has been added to the -mm mm-unstable branch.  Its filename is
     mm-add-as_writeback_indeterminate-mapping-flag.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-add-as_writeback_indeterminate-mapping-flag.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Joanne Koong <joannelkoong@xxxxxxxxx>
Subject: mm: add AS_WRITEBACK_INDETERMINATE mapping flag
Date: Fri, 22 Nov 2024 15:23:55 -0800

Patch series "fuse: remove temp page copies in writeback", v6.

The purpose of this patchset is to help make writeback-cache write
performance in FUSE filesystems as fast as possible.

In the current FUSE writeback design (see commit 3be5a52b30aa ("fuse:
support writable mmap"))), a temp page is allocated for every dirty page
to be written back, the contents of the dirty page are copied over to the
temp page, and the temp page gets handed to the server to write back. 
This is done so that writeback may be immediately cleared on the dirty
page, and this in turn is done for two reasons:

a) in order to mitigate the following deadlock scenario that may arise
   if reclaim waits on writeback on the dirty page to complete (more
   details can be found in this thread [1]):

   * single-threaded FUSE server is in the middle of handling a request
     that needs a memory allocation
   * memory allocation triggers direct reclaim
   * direct reclaim waits on a folio under writeback
   * the FUSE server can't write back the folio since it's stuck in
     direct reclaim

b) in order to unblock internal (eg sync, page compaction) waits on
   writeback without needing the server to complete writing back to disk,
   which may take an indeterminate amount of time.

Allocating and copying dirty pages to temp pages is the biggest
performance bottleneck for FUSE writeback.  This patchset aims to get rid
of the temp page altogether (which will also allow us to get rid of the
internal FUSE rb tree that is needed to keep track of writeback status on
the temp pages).  Benchmarks show approximately a 20% improvement in
throughput for 4k block-size writes and a 45% improvement for 1M
block-size writes.

With removing the temp page, writeback state is now only cleared on the
dirty page after the server has written it back to disk.  This may take an
indeterminate amount of time.  As well, there is also the possibility of
malicious or well-intentioned but buggy servers where writeback may in the
worst case scenario, never complete.  This means that any
folio_wait_writeback() on a dirty page belonging to a FUSE filesystem
needs to be carefully audited.

In particular, these are the cases that need to be accounted for:
* potentially deadlocking in reclaim, as mentioned above
* potentially stalling sync(2)
* potentially stalling page migration / compaction

This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which
filesystems may set on its inode mappings to indicate that writeback
operations may take an indeterminate amount of time to complete.  FUSE
will set this flag on its mappings.  This patchset adds checks to the
critical parts of reclaim, sync, and page migration logic where writeback
may be waited on.

Please note the following:
* For sync(2), waiting on writeback will be skipped for FUSE, but this has no
  effect on existing behavior. Dirty FUSE pages are already not guaranteed to
  be written to disk by the time sync(2) returns (eg writeback is cleared on
  the dirty page but the server may not have written out the temp page to disk
  yet). If the caller wishes to ensure the data has actually been synced to
  disk, they should use fsync(2)/fdatasync(2) instead.
* AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be
  waited on when in writeback. There are some cases where the wait is
  desirable. For example, for the sync_file_range() syscall, it is fine to
  wait on the writeback since the caller passes in a fd for the operation.

[1] https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3@xxxxxxxxxxxxxxxxx/


This patch (of 5):

Add a new mapping flag AS_WRITEBACK_INDETERMINATE which filesystems may
set to indicate that writing back to disk may take an indeterminate amount
of time to complete.  Extra caution should be taken when waiting on
writeback for folios belonging to mappings where this flag is set.

Link: https://lkml.kernel.org/r/20241122232359.429647-1-joannelkoong@xxxxxxxxx
Link: https://lkml.kernel.org/r/20241122232359.429647-2-joannelkoong@xxxxxxxxx
Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
Acked-by: Miklos Szeredi <mszeredi@xxxxxxxxxx>
Cc: Bernd Schubert <bernd.schubert@xxxxxxxxxxx>
Cc: Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx>
Cc: Josef Bacik <josef@xxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 include/linux/pagemap.h |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/include/linux/pagemap.h~mm-add-as_writeback_indeterminate-mapping-flag
+++ a/include/linux/pagemap.h
@@ -210,6 +210,7 @@ enum mapping_flags {
 	AS_STABLE_WRITES = 7,	/* must wait for writeback before modifying
 				   folio contents */
 	AS_INACCESSIBLE = 8,	/* Do not attempt direct R/W access to the mapping */
+	AS_WRITEBACK_INDETERMINATE = 9, /* Use caution when waiting on writeback */
 	/* Bits 16-25 are used for FOLIO_ORDER */
 	AS_FOLIO_ORDER_BITS = 5,
 	AS_FOLIO_ORDER_MIN = 16,
@@ -335,6 +336,16 @@ static inline bool mapping_inaccessible(
 	return test_bit(AS_INACCESSIBLE, &mapping->flags);
 }
 
+static inline void mapping_set_writeback_indeterminate(struct address_space *mapping)
+{
+	set_bit(AS_WRITEBACK_INDETERMINATE, &mapping->flags);
+}
+
+static inline bool mapping_writeback_indeterminate(struct address_space *mapping)
+{
+	return test_bit(AS_WRITEBACK_INDETERMINATE, &mapping->flags);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return mapping->gfp_mask;
_

Patches currently in -mm which might be from joannelkoong@xxxxxxxxx are

mm-add-as_writeback_indeterminate-mapping-flag.patch
mm-skip-reclaiming-folios-in-legacy-memcg-writeback-indeterminate-contexts.patch
fs-writeback-in-wait_sb_inodes-skip-wait-for-as_writeback_indeterminate-mappings.patch
mm-migrate-skip-migrating-folios-under-writeback-with-as_writeback_indeterminate-mappings.patch
fuse-remove-tmp-folio-for-writebacks-and-internal-rb-tree.patch





[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux