Re: [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 08-03-18 09:02:30, Dan Williams wrote:
> On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@xxxxxxx> wrote:
> > On Sun 07-01-18 13:58:42, Dan Williams wrote:
> >> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@xxxxxxx> wrote:
> >> > On Sat 23-12-17 16:57:31, Dan Williams wrote:
> >> >
> >> >> +     /*
> >> >> +      * Flush dax_dma_lock() sections to ensure all possible page
> >> >> +      * references have been taken, or will block on the fs
> >> >> +      * 'mmap_lock'.
> >> >> +      */
> >> >> +     synchronize_rcu();
> >> >
> >> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this.
> >> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress
> >> > for the pfn? We could indicate that by some bit in struct page or something
> >> > like that.
> >>
> >> We would need a lockless way to take a reference conditionally if the
> >> page is not subject to truncation.
> >>
> >> I recall the raid5 code did something similar where it split a
> >> reference count into 2 fields. I.e. take page->_refcount and use the
> >> upper bits as a truncation count. Something like:
> >>
> >> do {
> >>     old = atomic_read(&page->_refcount);
> >>     if (old & trunc_mask) /* upper bits of _refcount */
> >>         return false;
> >>     new = cnt + 1;
> >> } while (atomic_cmpxchg(&page->_refcount, old, new) != old);
> >> return true; /* we incremented the _refcount while the truncation
> >> count was zero */
> >>
> >> ...the only concern is teaching the put_page() path to consider that
> >> 'trunc_mask' when determining that the page is idle.
> >>
> >> Other ideas?
> >
> > What I rather thought about was an update to GUP paths (like
> > follow_page_pte()):
> >
> >         if (flags & FOLL_GET) {
> >                 get_page(page);
> >                 if (pte_devmap(pte)) {
> >                         /*
> >                          * Pairs with the barrier in the truncate path.
> >                          * Could be possibly _after_atomic version of the
> >                          * barrier.
> >                          */
> >                         smp_mb();
> >                         if (PageTruncateInProgress(page)) {
> >                                 put_page(page);
> >                                 ..bail...
> >                         }
> >                 }
> >         }
> >
> > and in the truncate path:
> >
> >         down_write(inode->i_mmap_sem);
> >         walk all pages in the mapping and mark them PageTruncateInProgress().
> >         unmap_mapping_range(...);
> >         /*
> >          * Pairs with the barrier in GUP path. In fact not necessary since
> >          * unmap_mapping_range() provides us with the barrier already.
> >          */
> >         smp_mb();
> >         /*
> >          * By now we are either guaranteed to see grabbed page reference or
> >          * GUP is guaranteed to see PageTruncateInProgress().
> >          */
> >         while ((page = dax_find_referenced_page(mapping))) {
> >                 ...
> >         }
> >
> > The barriers need some verification, I've opted for the conservative option
> > but I guess you get the idea.
> 
> [ Reviving this thread for the next rev of this patch set for 4.17
> consideration ]
> 
> I don't think this barrier scheme can work in the presence of
> get_user_pages_fast(). The get_user_pages_fast() path can race
> unmap_mapping_range() to take out an elevated reference count on a
> page.

Why the scheme cannot work? Sure you'd need to patch also gup_pte_range()
and a similar thing for PMDs to recheck PageTruncateInProgress() after
grabbing the page reference. But in principle I don't see anything
fundamentally different between gup_fast() and plain gup().

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux