Re: [PATCH v2 10/18] fsdax: Manage pgmap references at entry insertion and deletion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dave Chinner wrote:
> On Thu, Sep 22, 2022 at 02:54:42PM -0700, Dan Williams wrote:
> > Jason Gunthorpe wrote:
> > > On Wed, Sep 21, 2022 at 07:17:40PM -0700, Dan Williams wrote:
> > > > Jason Gunthorpe wrote:
> > > > > On Wed, Sep 21, 2022 at 05:14:34PM -0700, Dan Williams wrote:
> > > > > 
> > > > > > > Indeed, you could reasonably put such a liveness test at the moment
> > > > > > > every driver takes a 0 refcount struct page and turns it into a 1
> > > > > > > refcount struct page.
> > > > > > 
> > > > > > I could do it with a flag, but the reason to have pgmap->ref managed at
> > > > > > the page->_refcount 0 -> 1 and 1 -> 0 transitions is so at the end of
> > > > > > time memunmap_pages() can look at the one counter rather than scanning
> > > > > > and rescanning all the pages to see when they go to final idle.
> > > > > 
> > > > > That makes some sense too, but the logical way to do that is to put some
> > > > > counter along the page_free() path, and establish a 'make a page not
> > > > > free' path that does the other side.
> > > > > 
> > > > > ie it should not be in DAX code, it should be all in common pgmap
> > > > > code. The pgmap should never be freed while any page->refcount != 0
> > > > > and that should be an intrinsic property of pgmap, not relying on
> > > > > external parties.
> > > > 
> > > > I just do not know where to put such intrinsics since there is nothing
> > > > today that requires going through the pgmap object to discover the pfn
> > > > and 'allocate' the page.
> > > 
> > > I think that is just a new API that wrappers the set refcount = 1,
> > > percpu refcount and maybe building appropriate compound pages too.
> > > 
> > > Eg maybe something like:
> > > 
> > >   struct folio *pgmap_alloc_folios(pgmap, start, length)
> > > 
> > > And you get back maximally sized allocated folios with refcount = 1
> > > that span the requested range.
> > > 
> > > > In other words make dax_direct_access() the 'allocation' event that pins
> > > > the pgmap? I might be speaking a foreign language if you're not familiar
> > > > with the relationship of 'struct dax_device' to 'struct dev_pagemap'
> > > > instances. This is not the first time I have considered making them one
> > > > in the same.
> > > 
> > > I don't know enough about dax, so yes very foreign :)
> > > 
> > > I'm thinking broadly about how to make pgmap usable to all the other
> > > drivers in a safe and robust way that makes some kind of logical sense.
> > 
> > I think the API should be pgmap_folio_get() because, at least for DAX,
> > the memory is already allocated. The 'allocator' for fsdax is the
> > filesystem block allocator, and pgmap_folio_get() grants access to a
> 
> No, the "allocator" for fsdax is the inode iomap interface, not the
> filesystem block allocator. The filesystem block allocator is only
> involved in iomapping if we have to allocate a new mapping for a
> given file offset.
> 
> A better name for this is "arbiter", not allocator.  To get an
> active mapping of the DAX pages backing a file, we need to ask the
> inode iomap subsystem to *map a file offset* and it will return
> kaddr and/or pfns for the backing store the file offset maps to.
> 
> IOWs, for FSDAX, access to the backing store (i.e. the physical pages) is
> arbitrated by the *inode*, not the filesystem allocator or the dax
> device. Hence if a subsystem needs to pin the backing store for some
> use, it must first ensure that it holds an inode reference (direct
> or indirect) for that range of the backing store that will spans the
> life of the pin. When the pin is done, it can tear down the mappings
> it was using and then the inode reference can be released.
> 
> This ensures that any racing unlink of the inode will not result in
> the backing store being freed from under the application that has a
> pin. It will prevent the inode from being reclaimed and so
> potentially accessing stale or freed in-memory structures. And it
> will prevent the filesytem from being unmounted while the
> application using FSDAX access is still actively using that
> functionality even if it's already closed all it's fds....

Sounds so simple when you put it that way. I'll give it a shot and stop
the gymnastics of trying to get in front of truncate_inode_pages_final()
with a 'dax break layouts', just hold it off until final unpin.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux