Muchun Song wrote: > On Mon, Jul 04, 2022 at 11:38:16AM +0100, Matthew Wilcox wrote: > > On Mon, Jul 04, 2022 at 03:40:54PM +0800, Muchun Song wrote: > > > FSDAX page refcounts are 1-based, rather than 0-based: if refcount is > > > 1, then the page is freed. The FSDAX pages can be pinned through GUP, > > > then they will be unpinned via unpin_user_page() using a folio variant > > > to put the page, however, folio variants did not consider this special > > > case, the result will be to miss a wakeup event (like the user of > > > __fuse_dax_break_layouts()). > > > > Argh, no. The 1-based refcounts are a blight on the entire kernel. > > They need to go away, not be pushed into folios as well. I think > > I would be happy if this could go away. Continue to agree that this blight needs to end. One of the pre-requisites to getting back to normal accounting of FSDAX page pin counts was to first drop the usage of get_dev_pagemap() in the GUP path: https://lore.kernel.org/linux-mm/161604048257.1463742.1374527716381197629.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ That work stalled on notifying mappers of surprise removal events of FSDAX pfns. However, Ruan has been spending some time on that problem recently: https://lore.kernel.org/all/20220410171623.3788004-1-ruansy.fnst@xxxxxxxxxxx/ So, once I dig out from a bit of CXL backlog and review that effort the next step that I see will be convert the FSDAX path to take typical references vmf_insert() time. Unless I am missing a shorter path to get this fixed up? > > we're close to having that fixed, but until then, this should do > > the trick? > > > > The following fix looks good to me since it lowers the overhead as > much as possible This change looks good to me as well. > > Thanks. > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index cc98ab012a9b..4cef5e0f78b6 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1129,18 +1129,18 @@ static inline bool is_zone_movable_page(const struct page *page) > > #if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_FS_DAX) > > DECLARE_STATIC_KEY_FALSE(devmap_managed_key); > > > > -bool __put_devmap_managed_page(struct page *page); > > -static inline bool put_devmap_managed_page(struct page *page) > > +bool __put_devmap_managed_page(struct page *page, int refs); > > +static inline bool put_devmap_managed_page(struct page *page, int refs) > > { > > if (!static_branch_unlikely(&devmap_managed_key)) > > return false; > > if (!is_zone_device_page(page)) > > return false; > > - return __put_devmap_managed_page(page); > > + return __put_devmap_managed_page(page, refs); > > } > > > > #else /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ > > -static inline bool put_devmap_managed_page(struct page *page) > > +static inline bool put_devmap_managed_page(struct page *page, int refs) > > { > > return false; > > } > > @@ -1246,7 +1246,7 @@ static inline void put_page(struct page *page) > > * For some devmap managed pages we need to catch refcount transition > > * from 2 to 1: > > */ > > - if (put_devmap_managed_page(&folio->page)) > > + if (put_devmap_managed_page(&folio->page, 1)) > > return; > > folio_put(folio); > > } > > diff --git a/mm/gup.c b/mm/gup.c > > index d1132b39aa8f..28df02121c78 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -88,7 +88,8 @@ static inline struct folio *try_get_folio(struct page *page, int refs) > > * belongs to this folio. > > */ > > if (unlikely(page_folio(page) != folio)) { > > - folio_put_refs(folio, refs); > > + if (!put_devmap_managed_page(&folio->page, refs)) > > + folio_put_refs(folio, refs); > > goto retry; > > } > > > > @@ -177,6 +178,8 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) > > refs *= GUP_PIN_COUNTING_BIAS; > > } > > > > + if (put_devmap_managed_page(&folio->page, refs)) > > + return; > > folio_put_refs(folio, refs); > > } > > > > diff --git a/mm/memremap.c b/mm/memremap.c > > index b870a659eee6..b25e40e3a11e 100644 > > --- a/mm/memremap.c > > +++ b/mm/memremap.c > > @@ -499,7 +499,7 @@ void free_zone_device_page(struct page *page) > > } > > > > #ifdef CONFIG_FS_DAX > > -bool __put_devmap_managed_page(struct page *page) > > +bool __put_devmap_managed_page(struct page *page, int refs) > > { > > if (page->pgmap->type != MEMORY_DEVICE_FS_DAX) > > return false; > > @@ -509,7 +509,7 @@ bool __put_devmap_managed_page(struct page *page) > > * refcount is 1, then the page is free and the refcount is > > * stable because nobody holds a reference on the page. > > */ > > - if (page_ref_dec_return(page) == 1) > > + if (page_ref_sub_return(page, refs) == 1) > > wake_up_var(&page->_refcount); > > return true; > > } > > diff --git a/mm/swap.c b/mm/swap.c > > index c6194cfa2af6..94e42a9bab92 100644 > > --- a/mm/swap.c > > +++ b/mm/swap.c > > @@ -960,7 +960,7 @@ void release_pages(struct page **pages, int nr) > > unlock_page_lruvec_irqrestore(lruvec, flags); > > lruvec = NULL; > > } > > - if (put_devmap_managed_page(&folio->page)) > > + if (put_devmap_managed_page(&folio->page, 1)) > > continue; > > if (folio_put_testzero(folio)) > > free_zone_device_page(&folio->page); > >