On Fri, Mar 9, 2018 at 4:56 AM, Jan Kara <jack@xxxxxxx> wrote: > On Thu 08-03-18 09:02:30, Dan Williams wrote: >> On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@xxxxxxx> wrote: >> > On Sun 07-01-18 13:58:42, Dan Williams wrote: >> >> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@xxxxxxx> wrote: >> >> > On Sat 23-12-17 16:57:31, Dan Williams wrote: >> >> > >> >> >> + /* >> >> >> + * Flush dax_dma_lock() sections to ensure all possible page >> >> >> + * references have been taken, or will block on the fs >> >> >> + * 'mmap_lock'. >> >> >> + */ >> >> >> + synchronize_rcu(); >> >> > >> >> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this. >> >> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress >> >> > for the pfn? We could indicate that by some bit in struct page or something >> >> > like that. >> >> >> >> We would need a lockless way to take a reference conditionally if the >> >> page is not subject to truncation. >> >> >> >> I recall the raid5 code did something similar where it split a >> >> reference count into 2 fields. I.e. take page->_refcount and use the >> >> upper bits as a truncation count. Something like: >> >> >> >> do { >> >> old = atomic_read(&page->_refcount); >> >> if (old & trunc_mask) /* upper bits of _refcount */ >> >> return false; >> >> new = cnt + 1; >> >> } while (atomic_cmpxchg(&page->_refcount, old, new) != old); >> >> return true; /* we incremented the _refcount while the truncation >> >> count was zero */ >> >> >> >> ...the only concern is teaching the put_page() path to consider that >> >> 'trunc_mask' when determining that the page is idle. >> >> >> >> Other ideas? >> > >> > What I rather thought about was an update to GUP paths (like >> > follow_page_pte()): >> > >> > if (flags & FOLL_GET) { >> > get_page(page); >> > if (pte_devmap(pte)) { >> > /* >> > * Pairs with the barrier in the truncate path. >> > * Could be possibly _after_atomic version of the >> > * barrier. >> > */ >> > smp_mb(); >> > if (PageTruncateInProgress(page)) { >> > put_page(page); >> > ..bail... >> > } >> > } >> > } >> > >> > and in the truncate path: >> > >> > down_write(inode->i_mmap_sem); >> > walk all pages in the mapping and mark them PageTruncateInProgress(). >> > unmap_mapping_range(...); >> > /* >> > * Pairs with the barrier in GUP path. In fact not necessary since >> > * unmap_mapping_range() provides us with the barrier already. >> > */ >> > smp_mb(); >> > /* >> > * By now we are either guaranteed to see grabbed page reference or >> > * GUP is guaranteed to see PageTruncateInProgress(). >> > */ >> > while ((page = dax_find_referenced_page(mapping))) { >> > ... >> > } >> > >> > The barriers need some verification, I've opted for the conservative option >> > but I guess you get the idea. >> >> [ Reviving this thread for the next rev of this patch set for 4.17 >> consideration ] >> >> I don't think this barrier scheme can work in the presence of >> get_user_pages_fast(). The get_user_pages_fast() path can race >> unmap_mapping_range() to take out an elevated reference count on a >> page. > > Why the scheme cannot work? Sure you'd need to patch also gup_pte_range() > and a similar thing for PMDs to recheck PageTruncateInProgress() after > grabbing the page reference. But in principle I don't see anything > fundamentally different between gup_fast() and plain gup(). Ah, yes I didn't grok the abort on PageTruncateInProgress() until I read this again (and again), I'll try that.