Jason Gunthorpe wrote: > On Wed, Sep 21, 2022 at 05:14:34PM -0700, Dan Williams wrote: > > > > Indeed, you could reasonably put such a liveness test at the moment > > > every driver takes a 0 refcount struct page and turns it into a 1 > > > refcount struct page. > > > > I could do it with a flag, but the reason to have pgmap->ref managed at > > the page->_refcount 0 -> 1 and 1 -> 0 transitions is so at the end of > > time memunmap_pages() can look at the one counter rather than scanning > > and rescanning all the pages to see when they go to final idle. > > That makes some sense too, but the logical way to do that is to put some > counter along the page_free() path, and establish a 'make a page not > free' path that does the other side. > > ie it should not be in DAX code, it should be all in common pgmap > code. The pgmap should never be freed while any page->refcount != 0 > and that should be an intrinsic property of pgmap, not relying on > external parties. I just do not know where to put such intrinsics since there is nothing today that requires going through the pgmap object to discover the pfn and 'allocate' the page. I think you may be asking to unify dax_direct_access() with pgmap management where all dax_direct_access() users are required to take a page reference if the pfn it returns is going to be used outside of dax_read_lock(). In other words make dax_direct_access() the 'allocation' event that pins the pgmap? I might be speaking a foreign language if you're not familiar with the relationship of 'struct dax_device' to 'struct dev_pagemap' instances. This is not the first time I have considered making them one in the same. > Though I suspect if we were to look at performance it is probably > better to scan the memory on the unlikely case of pgmap removal than > to put more code in hot paths to keep track of refcounts.. It doesn't > need rescanning, just one sweep where it waits on every non-zero page > to become zero. True, on the way down nothing should be elevating page references, just waiting for the last one to drain. I am just not sure that pgmap removal is that unlikely going forward with things like the dax_kmem driver and CXL Dynamic Capacity Devices where tearing down DAX devices happens. Perhaps something to revisit if the pgmap percpu_ref ever shows up in profiles.