On Sat, Dec 19, 2015 at 10:37:46AM -0800, Dan Williams wrote: > On Fri, Dec 18, 2015 at 9:22 PM, Ross Zwisler > <ross.zwisler@xxxxxxxxxxxxxxx> wrote: > > To properly handle fsync/msync in an efficient way DAX needs to track dirty > > pages so it is able to flush them durably to media on demand. > > > > The tracking of dirty pages is done via the radix tree in struct > > address_space. This radix tree is already used by the page writeback > > infrastructure for tracking dirty pages associated with an open file, and > > it already has support for exceptional (non struct page*) entries. We > > build upon these features to add exceptional entries to the radix tree for > > DAX dirty PMD or PTE pages at fault time. > > > > Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> > [..] > > +static void dax_writeback_one(struct address_space *mapping, pgoff_t index, > > + void *entry) > > +{ > > + struct radix_tree_root *page_tree = &mapping->page_tree; > > + int type = RADIX_DAX_TYPE(entry); > > + struct radix_tree_node *node; > > + void **slot; > > + > > + if (type != RADIX_DAX_PTE && type != RADIX_DAX_PMD) { > > + WARN_ON_ONCE(1); > > + return; > > + } > > + > > + spin_lock_irq(&mapping->tree_lock); > > + /* > > + * Regular page slots are stabilized by the page lock even > > + * without the tree itself locked. These unlocked entries > > + * need verification under the tree lock. > > + */ > > + if (!__radix_tree_lookup(page_tree, index, &node, &slot)) > > + goto unlock; > > + if (*slot != entry) > > + goto unlock; > > + > > + /* another fsync thread may have already written back this entry */ > > + if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE)) > > + goto unlock; > > + > > + radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE); > > + > > + if (type == RADIX_DAX_PMD) > > + wb_cache_pmem(RADIX_DAX_ADDR(entry), PMD_SIZE); > > + else > > + wb_cache_pmem(RADIX_DAX_ADDR(entry), PAGE_SIZE); > > Hi Ross, I should have realized this sooner, but what guarantees that > the address returned by RADIX_DAX_ADDR(entry) is still valid at this > point? I think we need to store the sector in the radix tree and then > perform a new dax_map_atomic() operation to either lookup a valid > address or fail the sync request. Otherwise, if the device is gone > we'll crash, or write into some other random vmalloc address space. Ah, good point, thank you. v4 of this series is based on a version of DAX where we aren't properly dealing with PMEM device removal. I've got an updated version that merges with your dax_map_atomic() changes, and I'll add this change into v5 which I will send out today. Thank you for the suggestion. One clarification, with the code as it is in v4 we are only doing clflush/clflushopt/clwb instructions on the kaddr we've stored in the radix tree, so I don't think that there is actually a risk of us doing a "write into some other random vmalloc address space"? I think at worse we will end up clflushing an address that either isn't mapped or has been remapped by someone else. Or are you worried that the clflush would trigger a cache writeback to a memory address where writes have side effects, thus triggering the side effect? I definitely think it needs to be fixed, I'm just trying to make sure I understood your comment. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html