On Sun 07-01-18 13:58:42, Dan Williams wrote: > On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@xxxxxxx> wrote: > > On Sat 23-12-17 16:57:31, Dan Williams wrote: > > > >> + /* > >> + * Flush dax_dma_lock() sections to ensure all possible page > >> + * references have been taken, or will block on the fs > >> + * 'mmap_lock'. > >> + */ > >> + synchronize_rcu(); > > > > Frankly, I don't like synchronize_rcu() in a relatively hot path like this. > > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress > > for the pfn? We could indicate that by some bit in struct page or something > > like that. > > We would need a lockless way to take a reference conditionally if the > page is not subject to truncation. > > I recall the raid5 code did something similar where it split a > reference count into 2 fields. I.e. take page->_refcount and use the > upper bits as a truncation count. Something like: > > do { > old = atomic_read(&page->_refcount); > if (old & trunc_mask) /* upper bits of _refcount */ > return false; > new = cnt + 1; > } while (atomic_cmpxchg(&page->_refcount, old, new) != old); > return true; /* we incremented the _refcount while the truncation > count was zero */ > > ...the only concern is teaching the put_page() path to consider that > 'trunc_mask' when determining that the page is idle. > > Other ideas? What I rather thought about was an update to GUP paths (like follow_page_pte()): if (flags & FOLL_GET) { get_page(page); if (pte_devmap(pte)) { /* * Pairs with the barrier in the truncate path. * Could be possibly _after_atomic version of the * barrier. */ smp_mb(); if (PageTruncateInProgress(page)) { put_page(page); ..bail... } } } and in the truncate path: down_write(inode->i_mmap_sem); walk all pages in the mapping and mark them PageTruncateInProgress(). unmap_mapping_range(...); /* * Pairs with the barrier in GUP path. In fact not necessary since * unmap_mapping_range() provides us with the barrier already. */ smp_mb(); /* * By now we are either guaranteed to see grabbed page reference or * GUP is guaranteed to see PageTruncateInProgress(). */ while ((page = dax_find_referenced_page(mapping))) { ... } The barriers need some verification, I've opted for the conservative option but I guess you get the idea. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html