Re: [PATCH 1/3] dax: Make cache flushing protected by entry lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 21, 2016 at 05:45:13PM +0200, Jan Kara wrote:
> Currently, flushing of caches for DAX mappings was ignoring entry lock.
> So far this was ok (modulo a bug that a difference in entry lock could
> cause cache flushing to be mistakenly skipped) but in the following
> patches we will write-protect PTEs on cache flushing and clear dirty
> tags. For that we will need more exclusion. So do cache flushing under
> an entry lock. This allows us to remove one lock-unlock pair of
> mapping->tree_lock as a bonus.
> 
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> ---
>  fs/dax.c | 62 +++++++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 39 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 761495bf5eb9..5209f8cd0bee 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -669,35 +669,54 @@ static int dax_writeback_one(struct block_device *bdev,
>  		struct address_space *mapping, pgoff_t index, void *entry)
>  {
>  	struct radix_tree_root *page_tree = &mapping->page_tree;
> -	int type = RADIX_DAX_TYPE(entry);
> -	struct radix_tree_node *node;
> +	int type;
>  	struct blk_dax_ctl dax;
> -	void **slot;
>  	int ret = 0;
> +	void *entry2, **slot;

Nit: Let's retain the "reverse X-mas tree" ordering of our variable
definitions.

> -	spin_lock_irq(&mapping->tree_lock);
>  	/*
> -	 * Regular page slots are stabilized by the page lock even
> -	 * without the tree itself locked.  These unlocked entries
> -	 * need verification under the tree lock.
> +	 * A page got tagged dirty in DAX mapping? Something is seriously
> +	 * wrong.
>  	 */
> -	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> -		goto unlock;
> -	if (*slot != entry)
> -		goto unlock;
> -
> -	/* another fsync thread may have already written back this entry */
> -	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> -		goto unlock;
> +	if (WARN_ON(!radix_tree_exceptional_entry(entry)))
> +		return -EIO;
>  
> +	spin_lock_irq(&mapping->tree_lock);
> +	entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
> +	/* Entry got punched out / reallocated? */
> +	if (!entry2 || !radix_tree_exceptional_entry(entry2))
> +		goto put_unlock;
> +	/*
> +	 * Entry got reallocated elsewhere? No need to writeback. We have to
> +	 * compare sectors as we must not bail out due to difference in lockbit
> +	 * or entry type.
> +	 */
> +	if (RADIX_DAX_SECTOR(entry2) != RADIX_DAX_SECTOR(entry))
> +		goto put_unlock;
> +	type = RADIX_DAX_TYPE(entry2);
>  	if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
>  		ret = -EIO;
> -		goto unlock;
> +		goto put_unlock;
>  	}
> +	entry = entry2;

I don't think you need to set 'entry' here - you reset it in 4 lines via
lock_slot(), and don't use it in between.

> +
> +	/* Another fsync thread may have already written back this entry */
> +	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> +		goto put_unlock;
> +	/* Lock the entry to serialize with page faults */
> +	entry = lock_slot(mapping, slot);

As of this patch nobody unlocks the slot.  :)  A quick test of "write, fsync,
fsync" confirms that it deadlocks.

You introduce the proper calls to unlock the slot via
put_locked_mapping_entry() in patch 3/3 - those probably need to be in this
patch instead.

> +	/*
> +	 * We can clear the tag now but we have to be careful so that concurrent
> +	 * dax_writeback_one() calls for the same index cannot finish before we
> +	 * actually flush the caches. This is achieved as the calls will look
> +	 * at the entry only under tree_lock and once they do that they will
> +	 * see the entry locked and wait for it to unlock.
> +	 */
> +	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> +	spin_unlock_irq(&mapping->tree_lock);
>  
>  	dax.sector = RADIX_DAX_SECTOR(entry);
>  	dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE);
> -	spin_unlock_irq(&mapping->tree_lock);
>  
>  	/*
>  	 * We cannot hold tree_lock while calling dax_map_atomic() because it
> @@ -713,15 +732,12 @@ static int dax_writeback_one(struct block_device *bdev,
>  	}
>  
>  	wb_cache_pmem(dax.addr, dax.size);
> -
> -	spin_lock_irq(&mapping->tree_lock);
> -	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> -	spin_unlock_irq(&mapping->tree_lock);
> - unmap:
> +unmap:
>  	dax_unmap_atomic(bdev, &dax);
>  	return ret;
>  
> - unlock:
> +put_unlock:
> +	put_unlocked_mapping_entry(mapping, index, entry2);
>  	spin_unlock_irq(&mapping->tree_lock);
>  	return ret;
>  }
> -- 
> 2.6.6
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux