Jason Gunthorpe wrote: > On Thu, Sep 15, 2022 at 08:36:07PM -0700, Dan Williams wrote: > > The percpu_ref in 'struct dev_pagemap' is used to coordinate active > > mappings of device-memory with the device-removal / unbind path. It > > enables the semantic that initiating device-removal (or > > device-driver-unbind) blocks new mapping and DMA attempts, and waits for > > mapping revocation or inflight DMA to complete. > > This seems strange to me > > The pagemap should be ref'd as long as the filesystem is mounted over > the dax. The ref should be incrd when the filesystem is mounted and > decrd when it is unmounted. > > When the filesystem unmounts it should zap all the mappings (actually > I don't think you can even unmount a filesystem while mappings are > open) and wait for all page references to go to zero, then put the > final pagemap back. > > The rule is nothing can touch page->pgmap while page->refcount == 0, > and if page->refcount != 0 then page->pgmap must be valid, without any > refcounting on the page map itself. > > So, why do we need pgmap refcounting all over the place? It seems like > it only existed before because of the abuse of the page->refcount? Recall that this percpu_ref is mirroring the same function as blk_queue_enter() whereby every new request is checking to make sure the device is still alive, or whether it has started exiting. So pgmap 'live' reference taking in fs/dax.c allows the core to start failing fault requests once device teardown has started. It is a 'block new, and drain old' semantic.