Dan Williams <dan.j.williams@xxxxxxxxx> writes: > Alistair Popple wrote: >> >> Jason Gunthorpe <jgg@xxxxxxxxxx> writes: >> >> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote: >> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page >> >> refcount") device private pages have no longer had an extra reference >> >> count when the page is in use. However before handing them back to the >> >> owning device driver we add an extra reference count such that free >> >> pages have a reference count of one. >> >> >> >> This makes it difficult to tell if a page is free or not because both >> >> free and in use pages will have a non-zero refcount. Instead we should >> >> return pages to the drivers page allocator with a zero reference count. >> >> Kernel code can then safely use kernel functions such as >> >> get_page_unless_zero(). >> >> >> >> Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> >> >> --- >> >> arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + >> >> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + >> >> drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + >> >> lib/test_hmm.c | 1 + >> >> mm/memremap.c | 5 ----- >> >> mm/page_alloc.c | 6 ++++++ >> >> 6 files changed, 10 insertions(+), 5 deletions(-) >> > >> > I think this is a great idea, but I'm surprised no dax stuff is >> > touched here? >> >> free_zone_device_page() shouldn't be called for pgmap->type == >> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX >> there. Except that the folio code looks like it might have introduced a >> bug. AFAICT put_page() always calls >> put_devmap_managed_page(&folio->page) but folio_put() does not (although >> folios_put() does!). So it seems folio_put() won't end up calling >> __put_devmap_managed_page_refs() as I think it should. >> >> I think you're right about the change to __init_zone_device_page() - I >> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to >> look at Dan's patch series more closely as I suspect it might be better >> to rebase this patch on top of that. > > Apologies for the delay I was travelling the past few days. Yes, I think > this patch slots in nicely to avoid the introduction of an init_mode > [1]: > > https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx/ > > Mind if I steal it into my series? No problem, although I notice Andrew has already merged it into mm-unstable. If you end up rebasing your series on top of mine I think all that's needed is a patch somewhere in your series to drop the various `if (pgmap->type == MEMORY_DEVICE_*)` I added to (hopefully) avoid breaking DAX. Assuming DAX takes a pagemap reference on struct page allocation something like below. --- diff --git a/mm/memremap.c b/mm/memremap.c index 421bec3a29ee..da1a0e0abb8b 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -507,15 +507,7 @@ void free_zone_device_page(struct page *page) page->mapping = NULL; page->pgmap->ops->page_free(page); - if (page->pgmap->type != MEMORY_DEVICE_PRIVATE && - page->pgmap->type != MEMORY_DEVICE_COHERENT) - /* - * Reset the page count to 1 to prepare for handing out the page - * again. - */ - set_page_count(page, 1); - else - put_dev_pagemap(page->pgmap); + put_dev_pagemap(page->pgmap); } void zone_device_page_init(struct page *page) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 014dbdf54d62..3e5ff06700ca 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6816,9 +6816,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * ZONE_DEVICE pages are released directly to the driver page allocator * which will set the page count to 1 when allocating the page. */ - if (pgmap->type == MEMORY_DEVICE_PRIVATE || - pgmap->type == MEMORY_DEVICE_COHERENT) - set_page_count(page, 0); + set_page_count(page, 0); } /*