Dan Williams wrote: > Jason Gunthorpe wrote: > > On Thu, Sep 15, 2022 at 08:36:07PM -0700, Dan Williams wrote: > > > The percpu_ref in 'struct dev_pagemap' is used to coordinate active > > > mappings of device-memory with the device-removal / unbind path. It > > > enables the semantic that initiating device-removal (or > > > device-driver-unbind) blocks new mapping and DMA attempts, and waits for > > > mapping revocation or inflight DMA to complete. > > > > This seems strange to me > > > > The pagemap should be ref'd as long as the filesystem is mounted over > > the dax. The ref should be incrd when the filesystem is mounted and > > decrd when it is unmounted. > > > > When the filesystem unmounts it should zap all the mappings (actually > > I don't think you can even unmount a filesystem while mappings are > > open) and wait for all page references to go to zero, then put the > > final pagemap back. > > > > The rule is nothing can touch page->pgmap while page->refcount == 0, > > and if page->refcount != 0 then page->pgmap must be valid, without any > > refcounting on the page map itself. > > > > So, why do we need pgmap refcounting all over the place? It seems like > > it only existed before because of the abuse of the page->refcount? > > Recall that this percpu_ref is mirroring the same function as > blk_queue_enter() whereby every new request is checking to make sure the > device is still alive, or whether it has started exiting. > > So pgmap 'live' reference taking in fs/dax.c allows the core to start > failing fault requests once device teardown has started. It is a 'block > new, and drain old' semantic. However this line of questioning has me realizing that I have the put_dev_pagemap() in the wrong place. It needs to go in free_zone_device_page(), so that gup extends the lifetime of the device.