Andrew, This is essentially the same as what's currently in mm-unstable aside from the two updates listed below. The main thing to note is it incorporates Balbir's fixup which is currently in mm-unstable as c98612955016 ("mm-allow-compound-zone-device-pages-fix-fix") - Alistair On Fri, Feb 28, 2025 at 02:30:55PM +1100, Alistair Popple wrote: > Main updates since v8: > > - Fixed reading of bad pgmap in migrate_vma_collect_pmd() as reported/fixed > by Balbir. > > - Fixed bad warnings generated in free_zone_device_folio() when pgmap->ops > isn't defined, even if it's not required to be. As reported by Gerald. > > Main updates since v7: > > - Rebased on current akpm/mm-unstable in order to fix conflicts with > https://lore.kernel.org/linux-mm/20241216155408.8102-1-willy@xxxxxxxxxxxxx/ > as requested by Andrew. > > - Collected Ack'ed/Reviewed by > > - Cleaned up a unnecessary and confusing assignment to pgtable. > > - Other minor reworks suggested by David Hildenbrand > > Main updates since v6: > > - Clean ups and fixes based on feedback from David and Dan. > > - Rebased from next-20241216 to v6.14-rc1. No conflicts. > > - Dropped the PTE bit removals and clean-ups - will post this as a > separate series to be merged after this one as Dan wanted it split > up more and this series is already too big. > > Main updates since v5: > > - Reworked patch 1 based on Dan's feedback. > > - Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES > is no defined. > > - Minor comment formatting and documentation fixes. > > - Remove PTE_DEVMAP definitions from Loongarch which were added since > this series was initially written. > > Main updates since v4: > > - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This > means smaps/pagemap may contain DAX pages. > > - Fixed rmap accounting of PUD mapped pages. > > - Minor code clean-ups. > > Main updates since v3: > > - Rebased onto next-20241216. The rebase wasn't too difficult, but in > the interests of getting this out sooner for Andrew to look at as > requested by him I have yet to extensively build/run test this > version of the series. > > - Fixed a bunch of build breakages reported by John Hubbard and the > kernel test robot due to various combinations of CONFIG options. > > - Split the rmap changes into a separate patch as suggested by David H. > > - Reworded the description for the P2PDMA change. > > Main updates since v2: > > - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX > and have them pass the vmf struct. > > - Separate out the device DAX changes. > > - Restore the page share mapping counting and associated warnings. > > - Rework truncate to require file-systems to have previously called > dax_break_layout() to remove the address space mapping for a > page. This found several bugs which are fixed by the first half of > the series. The motivation for this was initially to allow the FS > DAX page-cache mappings to hold a reference on the page. > > However that turned out to be a dead-end (see the comments on patch > 21), but it found several bugs and I think overall it is an > improvement so I have left it here. > > Device and FS DAX pages have always maintained their own page > reference counts without following the normal rules for page reference > counting. In particular pages are considered free when the refcount > hits one rather than zero and refcounts are not added when mapping the > page. > > Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary > mechanism for allowing GUP to hold references on the page (see > get_dev_pagemap). However there doesn't seem to be any reason why FS > DAX pages need their own reference counting scheme. > > By treating the refcounts on these pages the same way as normal pages > we can remove a lot of special checks. In particular pXd_trans_huge() > becomes the same as pXd_leaf(), although I haven't made that change > here. It also frees up a valuable SW define PTE bit on architectures > that have devmap PTE bits defined. > > It also almost certainly allows further clean-up of the devmap managed > functions, but I have left that as a future improvment. It also > enables support for compound ZONE_DEVICE pages which is one of my > primary motivators for doing this work. > > Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> > Tested-by: Alison Schofield <alison.schofield@xxxxxxxxx> > > --- > > Cc: lina@xxxxxxxxxxxxx > Cc: zhang.lyra@xxxxxxxxx > Cc: gerald.schaefer@xxxxxxxxxxxxx > Cc: dan.j.williams@xxxxxxxxx > Cc: vishal.l.verma@xxxxxxxxx > Cc: dave.jiang@xxxxxxxxx > Cc: logang@xxxxxxxxxxxx > Cc: bhelgaas@xxxxxxxxxx > Cc: jack@xxxxxxx > Cc: jgg@xxxxxxxx > Cc: catalin.marinas@xxxxxxx > Cc: will@xxxxxxxxxx > Cc: mpe@xxxxxxxxxxxxxx > Cc: npiggin@xxxxxxxxx > Cc: dave.hansen@xxxxxxxxxxxxxxx > Cc: ira.weiny@xxxxxxxxx > Cc: willy@xxxxxxxxxxxxx > Cc: djwong@xxxxxxxxxx > Cc: tytso@xxxxxxx > Cc: linmiaohe@xxxxxxxxxx > Cc: david@xxxxxxxxxx > Cc: peterx@xxxxxxxxxx > Cc: linux-doc@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > Cc: linuxppc-dev@xxxxxxxxxxxxxxxx > Cc: nvdimm@xxxxxxxxxxxxxxx > Cc: linux-cxl@xxxxxxxxxxxxxxx > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > Cc: linux-mm@xxxxxxxxx > Cc: linux-ext4@xxxxxxxxxxxxxxx > Cc: linux-xfs@xxxxxxxxxxxxxxx > Cc: jhubbard@xxxxxxxxxx > Cc: hch@xxxxxx > Cc: david@xxxxxxxxxxxxx > Cc: chenhuacai@xxxxxxxxxx > Cc: kernel@xxxxxxxxxx > Cc: loongarch@xxxxxxxxxxxxxxx > > Alistair Popple (19): > fuse: Fix dax truncate/punch_hole fault path > fs/dax: Return unmapped busy pages from dax_layout_busy_page_range() > fs/dax: Don't skip locked entries when scanning entries > fs/dax: Refactor wait for dax idle page > fs/dax: Create a common implementation to break DAX layouts > fs/dax: Always remove DAX page-cache entries when breaking layouts > fs/dax: Ensure all pages are idle prior to filesystem unmount > fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag > mm/gup: Remove redundant check for PCI P2PDMA page > mm/mm_init: Move p2pdma page refcount initialisation to p2pdma > mm: Allow compound zone device pages > mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings > mm/memory: Add vmf_insert_page_mkwrite() > mm/rmap: Add support for PUD sized mappings to rmap > mm/huge_memory: Add vmf_insert_folio_pud() > mm/huge_memory: Add vmf_insert_folio_pmd() > mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages > fs/dax: Properly refcount fs dax pages > device/dax: Properly refcount device dax pages when mapping > > Dan Williams (1): > dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support > > Documentation/filesystems/dax.rst | 1 +- > drivers/dax/device.c | 15 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 +- > drivers/nvdimm/pmem.c | 4 +- > drivers/pci/p2pdma.c | 19 +- > drivers/s390/block/Kconfig | 12 +- > drivers/s390/block/dcssblk.c | 27 +- > fs/dax.c | 365 +++++++++++++++++++------- > fs/ext4/inode.c | 18 +- > fs/fuse/dax.c | 30 +-- > fs/fuse/dir.c | 2 +- > fs/fuse/file.c | 4 +- > fs/fuse/virtio_fs.c | 3 +- > fs/xfs/xfs_inode.c | 31 +-- > fs/xfs/xfs_inode.h | 2 +- > fs/xfs/xfs_super.c | 12 +- > include/linux/dax.h | 28 ++- > include/linux/huge_mm.h | 4 +- > include/linux/memremap.h | 17 +- > include/linux/migrate.h | 4 +- > include/linux/mm.h | 36 +--- > include/linux/mm_types.h | 16 +- > include/linux/mmzone.h | 12 +- > include/linux/page-flags.h | 6 +- > include/linux/rmap.h | 15 +- > lib/test_hmm.c | 3 +- > mm/gup.c | 14 +- > mm/hmm.c | 2 +- > mm/huge_memory.c | 170 ++++++++++-- > mm/internal.h | 2 +- > mm/memory-failure.c | 6 +- > mm/memory.c | 69 ++++- > mm/memremap.c | 60 ++-- > mm/migrate_device.c | 18 +- > mm/mlock.c | 2 +- > mm/mm_init.c | 23 +- > mm/rmap.c | 67 ++++- > mm/swap.c | 2 +- > mm/truncate.c | 16 +- > 39 files changed, 810 insertions(+), 330 deletions(-) > > base-commit: b2a64caeafad6e37df1c68f878bfdd06ff14f4ec > -- > git-series 0.9.1