Hi Ross, all your comments look fine and I've incorporated them. Thanks for review! Honza On Fri 24-06-16 15:44:45, Ross Zwisler wrote: > On Tue, Jun 21, 2016 at 05:45:13PM +0200, Jan Kara wrote: > > Currently, flushing of caches for DAX mappings was ignoring entry lock. > > So far this was ok (modulo a bug that a difference in entry lock could > > cause cache flushing to be mistakenly skipped) but in the following > > patches we will write-protect PTEs on cache flushing and clear dirty > > tags. For that we will need more exclusion. So do cache flushing under > > an entry lock. This allows us to remove one lock-unlock pair of > > mapping->tree_lock as a bonus. > > > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > --- > > fs/dax.c | 62 +++++++++++++++++++++++++++++++++++++++----------------------- > > 1 file changed, 39 insertions(+), 23 deletions(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 761495bf5eb9..5209f8cd0bee 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -669,35 +669,54 @@ static int dax_writeback_one(struct block_device *bdev, > > struct address_space *mapping, pgoff_t index, void *entry) > > { > > struct radix_tree_root *page_tree = &mapping->page_tree; > > - int type = RADIX_DAX_TYPE(entry); > > - struct radix_tree_node *node; > > + int type; > > struct blk_dax_ctl dax; > > - void **slot; > > int ret = 0; > > + void *entry2, **slot; > > Nit: Let's retain the "reverse X-mas tree" ordering of our variable > definitions. > > > - spin_lock_irq(&mapping->tree_lock); > > /* > > - * Regular page slots are stabilized by the page lock even > > - * without the tree itself locked. These unlocked entries > > - * need verification under the tree lock. > > + * A page got tagged dirty in DAX mapping? Something is seriously > > + * wrong. > > */ > > - if (!__radix_tree_lookup(page_tree, index, &node, &slot)) > > - goto unlock; > > - if (*slot != entry) > > - goto unlock; > > - > > - /* another fsync thread may have already written back this entry */ > > - if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE)) > > - goto unlock; > > + if (WARN_ON(!radix_tree_exceptional_entry(entry))) > > + return -EIO; > > > > + spin_lock_irq(&mapping->tree_lock); > > + entry2 = get_unlocked_mapping_entry(mapping, index, &slot); > > + /* Entry got punched out / reallocated? */ > > + if (!entry2 || !radix_tree_exceptional_entry(entry2)) > > + goto put_unlock; > > + /* > > + * Entry got reallocated elsewhere? No need to writeback. We have to > > + * compare sectors as we must not bail out due to difference in lockbit > > + * or entry type. > > + */ > > + if (RADIX_DAX_SECTOR(entry2) != RADIX_DAX_SECTOR(entry)) > > + goto put_unlock; > > + type = RADIX_DAX_TYPE(entry2); > > if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) { > > ret = -EIO; > > - goto unlock; > > + goto put_unlock; > > } > > + entry = entry2; > > I don't think you need to set 'entry' here - you reset it in 4 lines via > lock_slot(), and don't use it in between. > > > + > > + /* Another fsync thread may have already written back this entry */ > > + if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE)) > > + goto put_unlock; > > + /* Lock the entry to serialize with page faults */ > > + entry = lock_slot(mapping, slot); > > As of this patch nobody unlocks the slot. :) A quick test of "write, fsync, > fsync" confirms that it deadlocks. > > You introduce the proper calls to unlock the slot via > put_locked_mapping_entry() in patch 3/3 - those probably need to be in this > patch instead. > > > + /* > > + * We can clear the tag now but we have to be careful so that concurrent > > + * dax_writeback_one() calls for the same index cannot finish before we > > + * actually flush the caches. This is achieved as the calls will look > > + * at the entry only under tree_lock and once they do that they will > > + * see the entry locked and wait for it to unlock. > > + */ > > + radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE); > > + spin_unlock_irq(&mapping->tree_lock); > > > > dax.sector = RADIX_DAX_SECTOR(entry); > > dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE); > > - spin_unlock_irq(&mapping->tree_lock); > > > > /* > > * We cannot hold tree_lock while calling dax_map_atomic() because it > > @@ -713,15 +732,12 @@ static int dax_writeback_one(struct block_device *bdev, > > } > > > > wb_cache_pmem(dax.addr, dax.size); > > - > > - spin_lock_irq(&mapping->tree_lock); > > - radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE); > > - spin_unlock_irq(&mapping->tree_lock); > > - unmap: > > +unmap: > > dax_unmap_atomic(bdev, &dax); > > return ret; > > > > - unlock: > > +put_unlock: > > + put_unlocked_mapping_entry(mapping, index, entry2); > > spin_unlock_irq(&mapping->tree_lock); > > return ret; > > } > > -- > > 2.6.6 > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html