On Tue, Jun 4, 2019 at 12:48 PM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > > On 6/4/19 9:48 AM, ira.weiny@xxxxxxxxx wrote: > > From: Ira Weiny <ira.weiny@xxxxxxxxx> > > > > release_pages() is an optimized version of a loop around put_page(). > > Unfortunately for devmap pages the logic is not entirely correct in > > release_pages(). This is because device pages can be more than type > > MEMORY_DEVICE_PUBLIC. There are in fact 4 types, private, public, FS > > DAX, and PCI P2PDMA. Some of these have specific needs to "put" the > > page while others do not. > > > > This logic to handle any special needs is contained in > > put_devmap_managed_page(). Therefore all devmap pages should be > > processed by this function where we can contain the correct logic for a > > page put. > > > > Handle all device type pages within release_pages() by calling > > put_devmap_managed_page() on all devmap pages. If > > put_devmap_managed_page() returns true the page has been put and we > > continue with the next page. A false return of > > put_devmap_managed_page() means the page did not require special > > processing and should fall to "normal" processing. > > > > This was found via code inspection while determining if release_pages() > > and the new put_user_pages() could be interchangeable.[1] > > > > [1] https://lore.kernel.org/lkml/20190523172852.GA27175@xxxxxxxxxxxxxxxxxxxxxxxxx/ > > > > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx> > > Cc: Michal Hocko <mhocko@xxxxxxxx> > > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx> > > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx> > > > > --- > > Changes from V2: > > Update changelog for more clarity as requested by Michal > > Update comment WRT "failing" of put_devmap_managed_page() > > > > Changes from V1: > > Add comment clarifying that put_devmap_managed_page() can still > > fail. > > Add Reviewed-by tags. > > > > mm/swap.c | 13 +++++++++---- > > 1 file changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/mm/swap.c b/mm/swap.c > > index 7ede3eddc12a..6d153ce4cb8c 100644 > > --- a/mm/swap.c > > +++ b/mm/swap.c > > @@ -740,15 +740,20 @@ void release_pages(struct page **pages, int nr) > > if (is_huge_zero_page(page)) > > continue; > > > > - /* Device public page can not be huge page */ > > - if (is_device_public_page(page)) { > > + if (is_zone_device_page(page)) { > > if (locked_pgdat) { > > spin_unlock_irqrestore(&locked_pgdat->lru_lock, > > flags); > > locked_pgdat = NULL; > > } > > - put_devmap_managed_page(page); > > - continue; > > + /* > > + * Not all zone-device-pages require special > > + * processing. Those pages return 'false' from > > + * put_devmap_managed_page() expecting a call to > > + * put_page_testzero() > > + */ > > Just a documentation tweak: how about: > > /* > * ZONE_DEVICE pages that return 'false' from > * put_devmap_managed_page() do not require special > * processing, and instead, expect a call to > * put_page_testzero(). > */ Looks better to me, but maybe just go ahead and list those expectations explicitly. Something like: /* * put_devmap_managed_page() only handles * ZONE_DEVICE (struct dev_pagemap managed) * pages when the hosting dev_pagemap has the * ->free() or ->fault() callback handlers * implemented as indicated by * dev_pagemap.type. Otherwise the expectation * is to fall back to a plain decrement / * put_page_testzero(). */