Re: [PATCH 00/13] Fix the DAX-gup mistake

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jason Gunthorpe wrote:
> On Tue, Sep 06, 2022 at 05:54:54PM -0700, Dan Williams wrote:
> > Dan Williams wrote:
> > > Jason Gunthorpe wrote:
> > > > On Tue, Sep 06, 2022 at 11:37:36AM -0700, Dan Williams wrote:
> > > > > Jason Gunthorpe wrote:
> > > > > > On Tue, Sep 06, 2022 at 10:23:41AM -0700, Dan Williams wrote:
> > > > > > 
> > > > > > > > Can we continue to have the weird page->refcount behavior and still
> > > > > > > > change the other things?
> > > > > > > 
> > > > > > > No at a minimum the pgmap vs page->refcount problem needs to be solved
> > > > > > > first.
> > > > > > 
> > > > > > So who will do the put page after the PTE/PMD's are cleared out? In
> > > > > > the normal case the tlb flusher does it integrated into zap..
> > > > > 
> > > > > AFAICS the zap manages the _mapcount not _refcount. Are you talking
> > > > > about page_remove_rmap() or some other reference count drop?
> > > > 
> > > > No, page refcount.
> > > > 
> > > > __tlb_remove_page() eventually causes a put_page() via
> > > > tlb_batch_pages_flush() calling free_pages_and_swap_cache()
> > > > 
> > > > Eg:
> > > > 
> > > >  *  MMU_GATHER_NO_GATHER
> > > >  *
> > > >  *  If the option is set the mmu_gather will not track individual pages for
> > > >  *  delayed page free anymore. A platform that enables the option needs to
> > > >  *  provide its own implementation of the __tlb_remove_page_size() function to
> > > >  *  free pages.
> > > 
> > > Ok, yes, that is a vm_normal_page() mechanism which I was going to defer
> > > since it is incremental to the _refcount handling fix and maintain that
> > > DAX pages are still !vm_normal_page() in this set.
> > > 
> > > > > > Can we safely have the put page in the fsdax side after the zap?
> > > > > 
> > > > > The _refcount is managed from the lifetime insert_page() to
> > > > > truncate_inode_pages(), where for DAX those are managed from
> > > > > dax_insert_dentry() to dax_delete_mapping_entry().
> > > > 
> > > > As long as we all understand the page doesn't become re-allocatable
> > > > until the refcount reaches 0 and the free op is called it may be OK!
> > > 
> > > Yes, but this does mean that page_maybe_dma_pinned() is not sufficient for
> > > when the filesystem can safely reuse the page, it really needs to wait
> > > for the reference count to drop to 0 similar to how it waits for the
> > > page-idle condition today.
> > 
> > This gets tricky with break_layouts(). For example xfs_break_layouts()
> > wants to ensure that the page is gup idle while holding the mmap lock.
> > If the page is not gup idle it needs to drop locks and retry. It is
> > possible the path to drop a page reference also needs to acquire
> > filesystem locs. Consider odd cases like DMA from one offset to another
> > in the same file. So waiting with filesystem locks held is off the
> > table, which also means that deferring the wait until
> > dax_delete_mapping_entry() time is also off the table.
> > 
> > That means that even after the conversion to make DAX page references
> > 0-based it will still be the case that filesystem code will be waiting
> > for the 2 -> 1 transition to indicate "mapped DAX page has no more
> > external references".
> 
> Why?
> 
> If you are doing the break layouts wouldn't you first zap the ptes,
> which will bring the reference to 0 if there are not other users.

The internals of break layouts does zap the ptes, but it does not remove
the mapping entries. So, I was limiting my thinking to that constraint,
but now that I push on it, the need to keep the entry around until the
final truncate_setsize() event seems soft. It should be ok to upgrade
break layouts to delete mapping entries, wait for _refcount to drop to
zero, and then re-evaluate that nothing installed a new entry after
acquiring the filesystem locks again.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux