Re: [PATCH 00/13] Fix the DAX-gup mistake

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jason Gunthorpe wrote:
> On Wed, Sep 07, 2022 at 01:45:35PM -0700, Dan Williams wrote:
> > Jason Gunthorpe wrote:
> > > On Wed, Sep 07, 2022 at 11:43:52AM -0700, Dan Williams wrote:
> > > 
> > > > It is still the case that while waiting for the page to go idle it is
> > > > associated with its given file / inode. It is possible that
> > > > memory-failure, or some other event that requires looking up the page's
> > > > association, fires in that time span.
> > > 
> > > Can't the page->mapping can remain set to the address space even if it is
> > > not installed into any PTEs? Zap should only remove the PTEs, not
> > > clear the page->mapping.
> > > 
> > > Or, said another way, page->mapping should only change while the page
> > > refcount is 0 and thus the filesystem is completely in control of when
> > > it changes, and can do so under its own locks
> > > 
> > > If the refcount is 0 then memory failure should not happen - it would
> > > require someone accessed the page without referencing it. The only
> > > thing that could do that is the kernel, and if the kernel is
> > > referencing a 0 refcount page (eg it got converted to meta-data or
> > > something), it is probably not linked to an address space anymore
> > > anyhow?
> > 
> > First, thank you for helping me think through this, I am going to need
> > this thread in 6 months when I revisit this code.
> > 
> > I agree with the observation that page->mapping should only change while
> > the reference count is zero, but my problem is catching the 1 -> 0 in
> > its natural location in free_zone_device_page(). That and the fact that
> > the entry needs to be maintained until the page is actually disconnected
> > from the file to me means that break layouts holds off truncate until it
> > can observe the 0 refcount condition while holding filesystem locks, and
> > then the final truncate deletes the mapping entry which is already at 0.
> 
> Okay, that makes sense to me.. but what is "entry need to be
> maintained" mean?

I am talking about keeping an entry in the address_space until that page
is truncated out of the file, and I think the "zapped but not truncated"
state needs a new Xarray-value flag (DAX_ZAP) to track it. So the
life-cycle of a DAX page that is truncated becomes:

0/ devm_memremap_pages() to instantiate the page with _refcount==0

1/ dax_insert_entry() and vmf_insert_mixed() add an entry to the
address_space and install a pte for the page. _refcount==1.

2/ gup elevates _refcount to 2

3/ truncate or punch hole attempts to free the DAX page

4/ break layouts zaps the ptes, drops the reference from 1/, and waits
   for _refcount to drop to zero while holding fs locks.

5/ at the 1 -> 0 transition the address_space entry is tagged with a new
   flag DAX_ZAP to track that this page is unreferenced, but still
   associated with the mapping until the final truncate. I.e. the DAX_ZAP
   flag lets the fsdax core track when it has already dropped a page
   reference, but still has use for things like memory-failure to
   opportunistically use page->mapping on a 0-reference page.

I think this could be done without the DAX_ZAP flag, but I want to have
some safety to catch programming errors where the truncate path finds
entries already at a zero reference count without having first been
zapped.

> > I.e. break layouts waits until _refcount reaches 0, but entry removal
> > still needs one more dax_delete_mapping_entry() event to transitition to
> > the _refcount == 0 plus no address_space entry condition. Effectively
> > simulating _mapcount with address_space tracking until DAX pages can
> > become vm_normal_page().
> 
> This I don't follow.. Who will do the one more
> dax_delete_mapping_entry()?

The core of dax_delete_mapping_entry() is __dax_invalidate_entry(). I am
thinking something like __dax_invalidate_entry(mapping, index, ZAP) is
called for break layouts and __dax_invalidate_entry(mapping, index,
TRUNC) is called for finally disconnecting that page from its mapping.

> 
> I'm not sure what it has to do with normal_page?
> 

This thread is mainly about DAX slowly reinventing _mapcount that gets
managed in all the right places for a normal_page. Longer term I think
we either need to get back to the page-less DAX experiment and find a
way to unwind some of this page usage creep, or cut over to finally
making DAX-pages be normal pages and delete most of the special case
handling in fs/dax.c. I am open to discussing the former, but I think
the latter is more likely.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux