Jason Gunthorpe wrote: > On Wed, Sep 07, 2022 at 11:43:52AM -0700, Dan Williams wrote: > > > It is still the case that while waiting for the page to go idle it is > > associated with its given file / inode. It is possible that > > memory-failure, or some other event that requires looking up the page's > > association, fires in that time span. > > Can't the page->mapping can remain set to the address space even if it is > not installed into any PTEs? Zap should only remove the PTEs, not > clear the page->mapping. > > Or, said another way, page->mapping should only change while the page > refcount is 0 and thus the filesystem is completely in control of when > it changes, and can do so under its own locks > > If the refcount is 0 then memory failure should not happen - it would > require someone accessed the page without referencing it. The only > thing that could do that is the kernel, and if the kernel is > referencing a 0 refcount page (eg it got converted to meta-data or > something), it is probably not linked to an address space anymore > anyhow? First, thank you for helping me think through this, I am going to need this thread in 6 months when I revisit this code. I agree with the observation that page->mapping should only change while the reference count is zero, but my problem is catching the 1 -> 0 in its natural location in free_zone_device_page(). That and the fact that the entry needs to be maintained until the page is actually disconnected from the file to me means that break layouts holds off truncate until it can observe the 0 refcount condition while holding filesystem locks, and then the final truncate deletes the mapping entry which is already at 0. I.e. break layouts waits until _refcount reaches 0, but entry removal still needs one more dax_delete_mapping_entry() event to transitition to the _refcount == 0 plus no address_space entry condition. Effectively simulating _mapcount with address_space tracking until DAX pages can become vm_normal_page().