Re: [PATCH v6 0/5] fuse: remove temp page copies in writeback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 22, 2024 at 3:24 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> The purpose of this patchset is to help make writeback-cache write
> performance in FUSE filesystems as fast as possible.
>
> In the current FUSE writeback design (see commit 3be5a52b30aa
> ("fuse: support writable mmap"))), a temp page is allocated for every dirty
> page to be written back, the contents of the dirty page are copied over to the
> temp page, and the temp page gets handed to the server to write back. This is
> done so that writeback may be immediately cleared on the dirty page, and this
> in turn is done for two reasons:
> a) in order to mitigate the following deadlock scenario that may arise if
> reclaim waits on writeback on the dirty page to complete (more details can be
> found in this thread [1]):
> * single-threaded FUSE server is in the middle of handling a request
>   that needs a memory allocation
> * memory allocation triggers direct reclaim
> * direct reclaim waits on a folio under writeback
> * the FUSE server can't write back the folio since it's stuck in
>   direct reclaim
> b) in order to unblock internal (eg sync, page compaction) waits on writeback
> without needing the server to complete writing back to disk, which may take
> an indeterminate amount of time.
>
> Allocating and copying dirty pages to temp pages is the biggest performance
> bottleneck for FUSE writeback. This patchset aims to get rid of the temp page
> altogether (which will also allow us to get rid of the internal FUSE rb tree
> that is needed to keep track of writeback status on the temp pages).
> Benchmarks show approximately a 20% improvement in throughput for 4k
> block-size writes and a 45% improvement for 1M block-size writes.
>
> With removing the temp page, writeback state is now only cleared on the dirty
> page after the server has written it back to disk. This may take an
> indeterminate amount of time. As well, there is also the possibility of
> malicious or well-intentioned but buggy servers where writeback may in the
> worst case scenario, never complete. This means that any
> folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to
> be carefully audited.
>
> In particular, these are the cases that need to be accounted for:
> * potentially deadlocking in reclaim, as mentioned above
> * potentially stalling sync(2)
> * potentially stalling page migration / compaction
>
> This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which
> filesystems may set on its inode mappings to indicate that writeback
> operations may take an indeterminate amount of time to complete. FUSE will set
> this flag on its mappings. This patchset adds checks to the critical parts of
> reclaim, sync, and page migration logic where writeback may be waited on.
>
> Please note the following:
> * For sync(2), waiting on writeback will be skipped for FUSE, but this has no
>   effect on existing behavior. Dirty FUSE pages are already not guaranteed to
>   be written to disk by the time sync(2) returns (eg writeback is cleared on
>   the dirty page but the server may not have written out the temp page to disk
>   yet). If the caller wishes to ensure the data has actually been synced to
>   disk, they should use fsync(2)/fdatasync(2) instead.
> * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be
>   waited on when in writeback. There are some cases where the wait is
>   desirable. For example, for the sync_file_range() syscall, it is fine to
>   wait on the writeback since the caller passes in a fd for the operation.
>
> [1]
> https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3@xxxxxxxxxxxxxxxxx/
>
> Changelog
> ---------
> v5:
> https://lore.kernel.org/linux-fsdevel/20241115224459.427610-1-joannelkoong@xxxxxxxxx/
> Changes from v5 -> v6:
> * Add Shakeel and Jingbo's reviewed-bys
> * Move folio_end_writeback() to fuse_writepage_finish() (Jingbo)
> * Embed fuse_writepage_finish_stat() logic inline (Jingbo)
> * Remove node_stat NR_WRITEBACK inc/sub (Jingbo)
>
> v4:
> https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelkoong@xxxxxxxxx/
> Changes from v4 -> v5:
> * AS_WRITEBACK_MAY_BLOCK -> AS_WRITEBACK_INDETERMINATE (Shakeel)
> * Drop memory hotplug patch (David and Shakeel)
> * Remove some more kunnecessary writeback waits in fuse code (Jingbo)
> * Make commit message for reclaim patch more concise - drop part about
>   deadlock and just focus on how it may stall waits
>
> v3:
> https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelkoong@xxxxxxxxx/
> Changes from v3 -> v4:
> * Use filemap_fdatawait_range() instead of filemap_range_has_writeback() in
>   readahead
>
> v2:
> https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelkoong@xxxxxxxxx/
> Changes from v2 -> v3:
> * Account for sync and page migration cases as well (Miklos)
> * Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_BLOCK
> * For fuse inodes, set mapping_writeback_may_block only if fc->writeback_cache
>   is enabled
>
> v1:
> https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelkoong@xxxxxxxxx/T/#t
> Changes from v1 -> v2:
> * Have flag in "enum mapping_flags" instead of creating asop_flags (Shakeel)
> * Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel)
>
> Joanne Koong (5):
>   mm: add AS_WRITEBACK_INDETERMINATE mapping flag
>   mm: skip reclaiming folios in legacy memcg writeback indeterminate
>     contexts
>   fs/writeback: in wait_sb_inodes(), skip wait for
>     AS_WRITEBACK_INDETERMINATE mappings
>   mm/migrate: skip migrating folios under writeback with
>     AS_WRITEBACK_INDETERMINATE mappings
>   fuse: remove tmp folio for writebacks and internal rb tree
>
>  fs/fs-writeback.c       |   3 +
>  fs/fuse/file.c          | 360 ++++------------------------------------
>  fs/fuse/fuse_i.h        |   3 -
>  include/linux/pagemap.h |  11 ++
>  mm/migrate.c            |   5 +-
>  mm/vmscan.c             |  10 +-
>  6 files changed, 53 insertions(+), 339 deletions(-)
>

Miklos, may I get your thoughts on this patchset?


Thanks,
Joanne

> --
> 2.43.5
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux