Re: [PATCH 05/10] dax: Allow DAX code to replace exceptional entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 21, 2016 at 02:22:50PM +0100, Jan Kara wrote:
> Currently we forbid page_cache_tree_insert() to replace exceptional radix
> tree entries for DAX inodes. However to make DAX faults race free we will
> lock radix tree entries and when hole is created, we need to replace
> such locked radix tree entry with a hole page. So modify
> page_cache_tree_insert() to allow that.

Perhaps this is addressed later in the series, but at first glance this seems
unsafe to me - what happens of we had tasks waiting on the locked entry?  Are
they woken up when we replace the locked entry with a page cache entry?

If they are woken up, will they be alright with the fact that they were
sleeping on a lock for an exceptional entry, and now that lock no longer
exists?  The radix tree entry will now be filled with a zero page hole, and so
they can't acquire the lock they were sleeping for - instead if they wanted to
lock that page they'd need to lock the zero page page lock, right?

> Signed-off-by: Jan Kara <jack@xxxxxxx>
> ---
>  include/linux/dax.h |  6 ++++++
>  mm/filemap.c        | 18 +++++++++++-------
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 7c45ac7ea1d1..4b63923e1f8d 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -3,8 +3,14 @@
>  
>  #include <linux/fs.h>
>  #include <linux/mm.h>
> +#include <linux/radix-tree.h>
>  #include <asm/pgtable.h>
>  
> +/*
> + * Since exceptional entries do not use indirect bit, we reuse it as a lock bit
> + */
> +#define DAX_ENTRY_LOCK RADIX_TREE_INDIRECT_PTR
> +
>  ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
>  		  get_block_t, dio_iodone_t, int flags);
>  int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7c00f105845e..fbebedaf719e 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -597,14 +597,18 @@ static int page_cache_tree_insert(struct address_space *mapping,
>  		if (!radix_tree_exceptional_entry(p))
>  			return -EEXIST;
>  
> -		if (WARN_ON(dax_mapping(mapping)))
> -			return -EINVAL;
> -
> -		if (shadowp)
> -			*shadowp = p;
>  		mapping->nrexceptional--;
> -		if (node)
> -			workingset_node_shadows_dec(node);
> +		if (!dax_mapping(mapping)) {
> +			if (shadowp)
> +				*shadowp = p;
> +			if (node)
> +				workingset_node_shadows_dec(node);
> +		} else {
> +			/* DAX can replace empty locked entry with a hole */
> +			WARN_ON_ONCE(p !=
> +				(void *)(RADIX_TREE_EXCEPTIONAL_ENTRY |
> +					 DAX_ENTRY_LOCK));
> +		}
>  	}
>  	radix_tree_replace_slot(slot, page);
>  	mapping->nrpages++;
> -- 
> 2.6.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux