Jason Gunthorpe wrote: > On Wed, Sep 21, 2022 at 02:38:56PM -0700, Dan Williams wrote: > > Dan Williams wrote: > > > Jason Gunthorpe wrote: > > > > On Thu, Sep 15, 2022 at 08:36:07PM -0700, Dan Williams wrote: > > > > > The percpu_ref in 'struct dev_pagemap' is used to coordinate active > > > > > mappings of device-memory with the device-removal / unbind path. It > > > > > enables the semantic that initiating device-removal (or > > > > > device-driver-unbind) blocks new mapping and DMA attempts, and waits for > > > > > mapping revocation or inflight DMA to complete. > > > > > > > > This seems strange to me > > > > > > > > The pagemap should be ref'd as long as the filesystem is mounted over > > > > the dax. The ref should be incrd when the filesystem is mounted and > > > > decrd when it is unmounted. > > > > > > > > When the filesystem unmounts it should zap all the mappings (actually > > > > I don't think you can even unmount a filesystem while mappings are > > > > open) and wait for all page references to go to zero, then put the > > > > final pagemap back. > > > > > > > > The rule is nothing can touch page->pgmap while page->refcount == 0, > > > > and if page->refcount != 0 then page->pgmap must be valid, without any > > > > refcounting on the page map itself. > > > > > > > > So, why do we need pgmap refcounting all over the place? It seems like > > > > it only existed before because of the abuse of the page->refcount? > > > > > > Recall that this percpu_ref is mirroring the same function as > > > blk_queue_enter() whereby every new request is checking to make sure the > > > device is still alive, or whether it has started exiting. > > > > > > So pgmap 'live' reference taking in fs/dax.c allows the core to start > > > failing fault requests once device teardown has started. It is a 'block > > > new, and drain old' semantic. > > It is weird this email never arrived for me.. > > I think that is all fine, but it would be much more logically > expressed as a simple 'is pgmap alive' call before doing a new mapping > than mucking with the refcount logic. Such a test could simply > READ_ONCE a bool value in the pgmap struct. > > Indeed, you could reasonably put such a liveness test at the moment > every driver takes a 0 refcount struct page and turns it into a 1 > refcount struct page. I could do it with a flag, but the reason to have pgmap->ref managed at the page->_refcount 0 -> 1 and 1 -> 0 transitions is so at the end of time memunmap_pages() can look at the one counter rather than scanning and rescanning all the pages to see when they go to final idle.