On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@xxxxxxx> wrote: > On Sun 07-01-18 13:58:42, Dan Williams wrote: >> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@xxxxxxx> wrote: >> > On Sat 23-12-17 16:57:31, Dan Williams wrote: >> > >> >> + /* >> >> + * Flush dax_dma_lock() sections to ensure all possible page >> >> + * references have been taken, or will block on the fs >> >> + * 'mmap_lock'. >> >> + */ >> >> + synchronize_rcu(); >> > >> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this. >> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress >> > for the pfn? We could indicate that by some bit in struct page or something >> > like that. >> >> We would need a lockless way to take a reference conditionally if the >> page is not subject to truncation. >> >> I recall the raid5 code did something similar where it split a >> reference count into 2 fields. I.e. take page->_refcount and use the >> upper bits as a truncation count. Something like: >> >> do { >> old = atomic_read(&page->_refcount); >> if (old & trunc_mask) /* upper bits of _refcount */ >> return false; >> new = cnt + 1; >> } while (atomic_cmpxchg(&page->_refcount, old, new) != old); >> return true; /* we incremented the _refcount while the truncation >> count was zero */ >> >> ...the only concern is teaching the put_page() path to consider that >> 'trunc_mask' when determining that the page is idle. >> >> Other ideas? > > What I rather thought about was an update to GUP paths (like > follow_page_pte()): > > if (flags & FOLL_GET) { > get_page(page); > if (pte_devmap(pte)) { > /* > * Pairs with the barrier in the truncate path. > * Could be possibly _after_atomic version of the > * barrier. > */ > smp_mb(); > if (PageTruncateInProgress(page)) { > put_page(page); > ..bail... > } > } > } > > and in the truncate path: > > down_write(inode->i_mmap_sem); > walk all pages in the mapping and mark them PageTruncateInProgress(). > unmap_mapping_range(...); > /* > * Pairs with the barrier in GUP path. In fact not necessary since > * unmap_mapping_range() provides us with the barrier already. > */ > smp_mb(); > /* > * By now we are either guaranteed to see grabbed page reference or > * GUP is guaranteed to see PageTruncateInProgress(). > */ > while ((page = dax_find_referenced_page(mapping))) { > ... > } > > The barriers need some verification, I've opted for the conservative option > but I guess you get the idea. [ Reviving this thread for the next rev of this patch set for 4.17 consideration ] I don't think this barrier scheme can work in the presence of get_user_pages_fast(). The get_user_pages_fast() path can race unmap_mapping_range() to take out an elevated reference count on a page.