On Wed 16-03-16 08:34:28, NeilBrown wrote: > On Fri, Mar 11 2016, NeilBrown wrote: > > > On Fri, Mar 11 2016, Jan Kara wrote: > > > >> Currently DAX page fault locking is racy. > >> > >> CPU0 (write fault) CPU1 (read fault) > >> > >> __dax_fault() __dax_fault() > >> get_block(inode, block, &bh, 0) -> not mapped > >> get_block(inode, block, &bh, 0) > >> -> not mapped > >> if (!buffer_mapped(&bh)) > >> if (vmf->flags & FAULT_FLAG_WRITE) > >> get_block(inode, block, &bh, 1) -> allocates blocks > >> if (page) -> no > >> if (!buffer_mapped(&bh)) > >> if (vmf->flags & FAULT_FLAG_WRITE) { > >> } else { > >> dax_load_hole(); > >> } > >> dax_insert_mapping() > >> > >> And we are in a situation where we fail in dax_radix_entry() with -EIO. > >> > >> Another problem with the current DAX page fault locking is that there is > >> no race-free way to clear dirty tag in the radix tree. We can always > >> end up with clean radix tree and dirty data in CPU cache. > >> > >> We fix the first problem by introducing locking of exceptional radix > >> tree entries in DAX mappings acting very similarly to page lock and thus > >> synchronizing properly faults against the same mapping index. The same > >> lock can later be used to avoid races when clearing radix tree dirty > >> tag. > > > > Hi, > > I think the exception locking bits look good - I cannot comment on the > > rest. > > I looks like it was a good idea to bring the locking into dax.c instead > > of trying to make it generic. > > > > Actually ... I'm still bothered by the exclusive waiting. If an entry > is locked and there are two threads in dax_pfn_mkwrite() then one would > be woken up when the entry is unlocked and it will just set the TAG_DIRTY > flag and then continue without ever waking the next waiter on the > wait queue. > > I *think* that any thread which gets an exclusive wakeup is responsible > for performing another wakeup. In this case it must either lock the > slot, or call __wakeup. > That means: > grab_mapping_entry needs to call wakeup: > if radix_tree_preload() fails > if radix_tree_insert fails other than with -EEXIST > if a valid page was found Why would we need to call wake up when a valid page was found? In that case there should not be any process waiting for the radix tree entry lock. Otherwise I agree with you. Thanks for pointing this out, you've likely saved me quite some debugging ;). > dax_delete_mapping_entry needs to call wakeup > if the fail case, though as that isn't expect (WARN_ON_ONCE) > it should be a problem not to wakeup here > dax_pfn_mkwrite needs to call wakeup unconditionally Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html