The patch titled Subject: fs/dax: ensure all pages are idle prior to filesystem unmount has been added to the -mm mm-unstable branch. Its filename is fs-dax-ensure-all-pages-are-idle-prior-to-filesystem-unmount.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/fs-dax-ensure-all-pages-are-idle-prior-to-filesystem-unmount.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Alistair Popple <apopple@xxxxxxxxxx> Subject: fs/dax: ensure all pages are idle prior to filesystem unmount Date: Wed, 5 Feb 2025 09:48:04 +1100 File systems call dax_break_mapping() prior to reallocating file system blocks to ensure the page is not undergoing any DMA or other accesses. Generally this is needed when a file is truncated to ensure that if a block is reallocated nothing is writing to it. However filesystems currently don't call this when an FS DAX inode is evicted. This can cause problems when the file system is unmounted as a page can continue to be under going DMA or other remote access after unmount. This means if the file system is remounted any truncate or other operation which requires the underlying file system block to be freed will not wait for the remote access to complete. Therefore a busy block may be reallocated to a new file leading to corruption. Link: https://lkml.kernel.org/r/6f23832debd919787c57fc5ef19561a45c034bce.1738709036.git-series.apopple@xxxxxxxxxx Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> Tested-by: Alison Schofield <alison.schofield@xxxxxxxxx> Cc: Asahi Lina <lina@xxxxxxxxxxxxx> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: Chunyan Zhang <zhang.lyra@xxxxxxxxx> Cc: "Darrick J. Wong" <djwong@xxxxxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Dave Jiang <dave.jiang@xxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> Cc: Huacai Chen <chenhuacai@xxxxxxxxxx> Cc: Ira Weiny <ira.weiny@xxxxxxxxx> Cc: Jan Kara <jack@xxxxxxx> Cc: Jason Gunthorpe <jgg@xxxxxxxx> Cc: John Hubbard <jhubbard@xxxxxxxxxx> Cc: linmiaohe <linmiaohe@xxxxxxxxxx> Cc: Logan Gunthorpe <logang@xxxxxxxxxxxx> Cc: Mattew Wilcox <willy@xxxxxxxxxxxxx> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> Cc: Nicholas Piggin <npiggin@xxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Ted Ts'o <tytso@xxxxxxx> Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx> Cc: WANG Xuerui <kernel@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Alexander Gordeev <agordeev@xxxxxxxxxxxxx> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> Cc: Dan Wiliams <dan.j.williams@xxxxxxxxx> Cc: Heiko Carstens <hca@xxxxxxxxxxxxx> Cc: Jason Gunthorpe <jgg@xxxxxxxxxx> Cc: Sven Schnelle <svens@xxxxxxxxxxxxx> Cc: Vasily Gorbik <gor@xxxxxxxxxxxxx> Cc: Vivek Goyal <vgoyal@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/dax.c | 27 +++++++++++++++++++++++++++ fs/ext4/inode.c | 2 ++ fs/xfs/xfs_super.c | 12 ++++++++++++ include/linux/dax.h | 5 +++++ 4 files changed, 46 insertions(+) --- a/fs/dax.c~fs-dax-ensure-all-pages-are-idle-prior-to-filesystem-unmount +++ a/fs/dax.c @@ -883,6 +883,13 @@ static int wait_page_idle(struct page *p TASK_INTERRUPTIBLE, 0, 0, cb(inode)); } +static void wait_page_idle_uninterruptible(struct page *page, + struct inode *inode) +{ + ___wait_var_event(page, dax_page_is_idle(page), + TASK_UNINTERRUPTIBLE, 0, 0, schedule()); +} + /* * Unmaps the inode and waits for any DMA to complete prior to deleting the * DAX mapping entries for the range. @@ -918,6 +925,26 @@ int dax_break_layout(struct inode *inode } EXPORT_SYMBOL_GPL(dax_break_layout); +void dax_break_layout_final(struct inode *inode) +{ + struct page *page; + + if (!dax_mapping(inode->i_mapping)) + return; + + do { + page = dax_layout_busy_page_range(inode->i_mapping, 0, + LLONG_MAX); + if (!page) + break; + + wait_page_idle_uninterruptible(page, inode); + } while (true); + + dax_delete_mapping_range(inode->i_mapping, 0, LLONG_MAX); +} +EXPORT_SYMBOL_GPL(dax_break_layout_final); + /* * Invalidate DAX entry if it is clean. */ --- a/fs/ext4/inode.c~fs-dax-ensure-all-pages-are-idle-prior-to-filesystem-unmount +++ a/fs/ext4/inode.c @@ -181,6 +181,8 @@ void ext4_evict_inode(struct inode *inod trace_ext4_evict_inode(inode); + dax_break_layout_final(inode); + if (EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL) ext4_evict_ea_inode(inode); if (inode->i_nlink) { --- a/fs/xfs/xfs_super.c~fs-dax-ensure-all-pages-are-idle-prior-to-filesystem-unmount +++ a/fs/xfs/xfs_super.c @@ -751,6 +751,17 @@ xfs_fs_drop_inode( return generic_drop_inode(inode); } +STATIC void +xfs_fs_evict_inode( + struct inode *inode) +{ + if (IS_DAX(inode)) + dax_break_layout_final(inode); + + truncate_inode_pages_final(&inode->i_data); + clear_inode(inode); +} + static void xfs_mount_free( struct xfs_mount *mp) @@ -1215,6 +1226,7 @@ static const struct super_operations xfs .destroy_inode = xfs_fs_destroy_inode, .dirty_inode = xfs_fs_dirty_inode, .drop_inode = xfs_fs_drop_inode, + .evict_inode = xfs_fs_evict_inode, .put_super = xfs_fs_put_super, .sync_fs = xfs_fs_sync_fs, .freeze_fs = xfs_fs_freeze, --- a/include/linux/dax.h~fs-dax-ensure-all-pages-are-idle-prior-to-filesystem-unmount +++ a/include/linux/dax.h @@ -232,6 +232,10 @@ static inline int __must_check dax_break { return 0; } + +static inline void dax_break_layout_final(struct inode *inode) +{ +} #endif bool dax_alive(struct dax_device *dax_dev); @@ -266,6 +270,7 @@ static inline int __must_check dax_break { return dax_break_layout(inode, 0, LLONG_MAX, cb); } +void dax_break_layout_final(struct inode *inode); int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest, loff_t destoff, loff_t len, bool *is_same, _ Patches currently in -mm which might be from apopple@xxxxxxxxxx are fuse-fix-dax-truncate-punch_hole-fault-path.patch fs-dax-return-unmapped-busy-pages-from-dax_layout_busy_page_range.patch fs-dax-dont-skip-locked-entries-when-scanning-entries.patch fs-dax-refactor-wait-for-dax-idle-page.patch fs-dax-create-a-common-implementation-to-break-dax-layouts.patch fs-dax-always-remove-dax-page-cache-entries-when-breaking-layouts.patch fs-dax-ensure-all-pages-are-idle-prior-to-filesystem-unmount.patch fs-dax-remove-page_mapping_dax_shared-mapping-flag.patch mm-gup-remove-redundant-check-for-pci-p2pdma-page.patch mm-mm_init-move-p2pdma-page-refcount-initialisation-to-p2pdma.patch mm-allow-compound-zone-device-pages.patch mm-memory-enhance-insert_page_into_pte_locked-to-create-writable-mappings.patch mm-memory-add-vmf_insert_page_mkwrite.patch rmap-add-support-for-pud-sized-mappings-to-rmap.patch huge_memory-add-vmf_insert_folio_pud.patch huge_memory-add-vmf_insert_folio_pmd.patch mm-gup-dont-allow-foll_longterm-pinning-of-fs-dax-pages.patch fs-dax-properly-refcount-fs-dax-pages.patch device-dax-properly-refcount-device-dax-pages-when-mapping.patch