On Thu 08-03-18 09:02:30, Dan Williams wrote: > On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@xxxxxxx> wrote: > > On Sun 07-01-18 13:58:42, Dan Williams wrote: > >> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@xxxxxxx> wrote: > >> > On Sat 23-12-17 16:57:31, Dan Williams wrote: > >> > > >> >> + /* > >> >> + * Flush dax_dma_lock() sections to ensure all possible page > >> >> + * references have been taken, or will block on the fs > >> >> + * 'mmap_lock'. > >> >> + */ > >> >> + synchronize_rcu(); > >> > > >> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this. > >> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress > >> > for the pfn? We could indicate that by some bit in struct page or something > >> > like that. > >> > >> We would need a lockless way to take a reference conditionally if the > >> page is not subject to truncation. > >> > >> I recall the raid5 code did something similar where it split a > >> reference count into 2 fields. I.e. take page->_refcount and use the > >> upper bits as a truncation count. Something like: > >> > >> do { > >> old = atomic_read(&page->_refcount); > >> if (old & trunc_mask) /* upper bits of _refcount */ > >> return false; > >> new = cnt + 1; > >> } while (atomic_cmpxchg(&page->_refcount, old, new) != old); > >> return true; /* we incremented the _refcount while the truncation > >> count was zero */ > >> > >> ...the only concern is teaching the put_page() path to consider that > >> 'trunc_mask' when determining that the page is idle. > >> > >> Other ideas? > > > > What I rather thought about was an update to GUP paths (like > > follow_page_pte()): > > > > if (flags & FOLL_GET) { > > get_page(page); > > if (pte_devmap(pte)) { > > /* > > * Pairs with the barrier in the truncate path. > > * Could be possibly _after_atomic version of the > > * barrier. > > */ > > smp_mb(); > > if (PageTruncateInProgress(page)) { > > put_page(page); > > ..bail... > > } > > } > > } > > > > and in the truncate path: > > > > down_write(inode->i_mmap_sem); > > walk all pages in the mapping and mark them PageTruncateInProgress(). > > unmap_mapping_range(...); > > /* > > * Pairs with the barrier in GUP path. In fact not necessary since > > * unmap_mapping_range() provides us with the barrier already. > > */ > > smp_mb(); > > /* > > * By now we are either guaranteed to see grabbed page reference or > > * GUP is guaranteed to see PageTruncateInProgress(). > > */ > > while ((page = dax_find_referenced_page(mapping))) { > > ... > > } > > > > The barriers need some verification, I've opted for the conservative option > > but I guess you get the idea. > > [ Reviving this thread for the next rev of this patch set for 4.17 > consideration ] > > I don't think this barrier scheme can work in the presence of > get_user_pages_fast(). The get_user_pages_fast() path can race > unmap_mapping_range() to take out an elevated reference count on a > page. Why the scheme cannot work? Sure you'd need to patch also gup_pte_range() and a similar thing for PMDs to recheck PageTruncateInProgress() after grabbing the page reference. But in principle I don't see anything fundamentally different between gup_fast() and plain gup(). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR