Re: [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew,

This is essentially the same as what's currently in mm-unstable aside from
the two updates listed below. The main thing to note is it incorporates
Balbir's fixup which is currently in mm-unstable as c98612955016
("mm-allow-compound-zone-device-pages-fix-fix")

 - Alistair

On Fri, Feb 28, 2025 at 02:30:55PM +1100, Alistair Popple wrote:
> Main updates since v8:
> 
>  - Fixed reading of bad pgmap in migrate_vma_collect_pmd() as reported/fixed
>    by Balbir.
> 
>  - Fixed bad warnings generated in free_zone_device_folio() when pgmap->ops
>    isn't defined, even if it's not required to be. As reported by Gerald.
> 
> Main updates since v7:
> 
>  - Rebased on current akpm/mm-unstable in order to fix conflicts with
>    https://lore.kernel.org/linux-mm/20241216155408.8102-1-willy@xxxxxxxxxxxxx/
>    as requested by Andrew.
> 
>  - Collected Ack'ed/Reviewed by
> 
>  - Cleaned up a unnecessary and confusing assignment to pgtable.
> 
>  - Other minor reworks suggested by David Hildenbrand
> 
> Main updates since v6:
> 
>  - Clean ups and fixes based on feedback from David and Dan.
> 
>  - Rebased from next-20241216 to v6.14-rc1. No conflicts.
> 
>  - Dropped the PTE bit removals and clean-ups - will post this as a
>    separate series to be merged after this one as Dan wanted it split
>    up more and this series is already too big.
> 
> Main updates since v5:
> 
>  - Reworked patch 1 based on Dan's feedback.
> 
>  - Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES
>    is no defined.
> 
>  - Minor comment formatting and documentation fixes.
> 
>  - Remove PTE_DEVMAP definitions from Loongarch which were added since
>    this series was initially written.
> 
> Main updates since v4:
> 
>  - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This
>    means smaps/pagemap may contain DAX pages.
> 
>  - Fixed rmap accounting of PUD mapped pages.
> 
>  - Minor code clean-ups.
> 
> Main updates since v3:
> 
>  - Rebased onto next-20241216. The rebase wasn't too difficult, but in
>    the interests of getting this out sooner for Andrew to look at as
>    requested by him I have yet to extensively build/run test this
>    version of the series.
> 
>  - Fixed a bunch of build breakages reported by John Hubbard and the
>    kernel test robot due to various combinations of CONFIG options.
> 
>  - Split the rmap changes into a separate patch as suggested by David H.
> 
>  - Reworded the description for the P2PDMA change.
> 
> Main updates since v2:
> 
>  - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX
>    and have them pass the vmf struct.
> 
>  - Separate out the device DAX changes.
> 
>  - Restore the page share mapping counting and associated warnings.
> 
>  - Rework truncate to require file-systems to have previously called
>    dax_break_layout() to remove the address space mapping for a
>    page. This found several bugs which are fixed by the first half of
>    the series. The motivation for this was initially to allow the FS
>    DAX page-cache mappings to hold a reference on the page.
> 
>    However that turned out to be a dead-end (see the comments on patch
>    21), but it found several bugs and I think overall it is an
>    improvement so I have left it here.
> 
> Device and FS DAX pages have always maintained their own page
> reference counts without following the normal rules for page reference
> counting. In particular pages are considered free when the refcount
> hits one rather than zero and refcounts are not added when mapping the
> page.
> 
> Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
> mechanism for allowing GUP to hold references on the page (see
> get_dev_pagemap). However there doesn't seem to be any reason why FS
> DAX pages need their own reference counting scheme.
> 
> By treating the refcounts on these pages the same way as normal pages
> we can remove a lot of special checks. In particular pXd_trans_huge()
> becomes the same as pXd_leaf(), although I haven't made that change
> here. It also frees up a valuable SW define PTE bit on architectures
> that have devmap PTE bits defined.
> 
> It also almost certainly allows further clean-up of the devmap managed
> functions, but I have left that as a future improvment. It also
> enables support for compound ZONE_DEVICE pages which is one of my
> primary motivators for doing this work.
> 
> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx>
> Tested-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> 
> ---
> 
> Cc: lina@xxxxxxxxxxxxx
> Cc: zhang.lyra@xxxxxxxxx
> Cc: gerald.schaefer@xxxxxxxxxxxxx
> Cc: dan.j.williams@xxxxxxxxx
> Cc: vishal.l.verma@xxxxxxxxx
> Cc: dave.jiang@xxxxxxxxx
> Cc: logang@xxxxxxxxxxxx
> Cc: bhelgaas@xxxxxxxxxx
> Cc: jack@xxxxxxx
> Cc: jgg@xxxxxxxx
> Cc: catalin.marinas@xxxxxxx
> Cc: will@xxxxxxxxxx
> Cc: mpe@xxxxxxxxxxxxxx
> Cc: npiggin@xxxxxxxxx
> Cc: dave.hansen@xxxxxxxxxxxxxxx
> Cc: ira.weiny@xxxxxxxxx
> Cc: willy@xxxxxxxxxxxxx
> Cc: djwong@xxxxxxxxxx
> Cc: tytso@xxxxxxx
> Cc: linmiaohe@xxxxxxxxxx
> Cc: david@xxxxxxxxxx
> Cc: peterx@xxxxxxxxxx
> Cc: linux-doc@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: linuxppc-dev@xxxxxxxxxxxxxxxx
> Cc: nvdimm@xxxxxxxxxxxxxxx
> Cc: linux-cxl@xxxxxxxxxxxxxxx
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: linux-mm@xxxxxxxxx
> Cc: linux-ext4@xxxxxxxxxxxxxxx
> Cc: linux-xfs@xxxxxxxxxxxxxxx
> Cc: jhubbard@xxxxxxxxxx
> Cc: hch@xxxxxx
> Cc: david@xxxxxxxxxxxxx
> Cc: chenhuacai@xxxxxxxxxx
> Cc: kernel@xxxxxxxxxx
> Cc: loongarch@xxxxxxxxxxxxxxx
> 
> Alistair Popple (19):
>   fuse: Fix dax truncate/punch_hole fault path
>   fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()
>   fs/dax: Don't skip locked entries when scanning entries
>   fs/dax: Refactor wait for dax idle page
>   fs/dax: Create a common implementation to break DAX layouts
>   fs/dax: Always remove DAX page-cache entries when breaking layouts
>   fs/dax: Ensure all pages are idle prior to filesystem unmount
>   fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag
>   mm/gup: Remove redundant check for PCI P2PDMA page
>   mm/mm_init: Move p2pdma page refcount initialisation to p2pdma
>   mm: Allow compound zone device pages
>   mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings
>   mm/memory: Add vmf_insert_page_mkwrite()
>   mm/rmap: Add support for PUD sized mappings to rmap
>   mm/huge_memory: Add vmf_insert_folio_pud()
>   mm/huge_memory: Add vmf_insert_folio_pmd()
>   mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages
>   fs/dax: Properly refcount fs dax pages
>   device/dax: Properly refcount device dax pages when mapping
> 
> Dan Williams (1):
>   dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support
> 
>  Documentation/filesystems/dax.rst      |   1 +-
>  drivers/dax/device.c                   |  15 +-
>  drivers/gpu/drm/nouveau/nouveau_dmem.c |   3 +-
>  drivers/nvdimm/pmem.c                  |   4 +-
>  drivers/pci/p2pdma.c                   |  19 +-
>  drivers/s390/block/Kconfig             |  12 +-
>  drivers/s390/block/dcssblk.c           |  27 +-
>  fs/dax.c                               | 365 +++++++++++++++++++-------
>  fs/ext4/inode.c                        |  18 +-
>  fs/fuse/dax.c                          |  30 +--
>  fs/fuse/dir.c                          |   2 +-
>  fs/fuse/file.c                         |   4 +-
>  fs/fuse/virtio_fs.c                    |   3 +-
>  fs/xfs/xfs_inode.c                     |  31 +--
>  fs/xfs/xfs_inode.h                     |   2 +-
>  fs/xfs/xfs_super.c                     |  12 +-
>  include/linux/dax.h                    |  28 ++-
>  include/linux/huge_mm.h                |   4 +-
>  include/linux/memremap.h               |  17 +-
>  include/linux/migrate.h                |   4 +-
>  include/linux/mm.h                     |  36 +---
>  include/linux/mm_types.h               |  16 +-
>  include/linux/mmzone.h                 |  12 +-
>  include/linux/page-flags.h             |   6 +-
>  include/linux/rmap.h                   |  15 +-
>  lib/test_hmm.c                         |   3 +-
>  mm/gup.c                               |  14 +-
>  mm/hmm.c                               |   2 +-
>  mm/huge_memory.c                       | 170 ++++++++++--
>  mm/internal.h                          |   2 +-
>  mm/memory-failure.c                    |   6 +-
>  mm/memory.c                            |  69 ++++-
>  mm/memremap.c                          |  60 ++--
>  mm/migrate_device.c                    |  18 +-
>  mm/mlock.c                             |   2 +-
>  mm/mm_init.c                           |  23 +-
>  mm/rmap.c                              |  67 ++++-
>  mm/swap.c                              |   2 +-
>  mm/truncate.c                          |  16 +-
>  39 files changed, 810 insertions(+), 330 deletions(-)
> 
> base-commit: b2a64caeafad6e37df1c68f878bfdd06ff14f4ec
> -- 
> git-series 0.9.1




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux