Re: [PATCH 1/2] mm: Protect operations adding pages to page cache with i_mapping_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 08, 2021 at 05:39:17PM +0100, Jan Kara wrote:
> Currently, serializing operations such as page fault, read, or readahead
> against hole punching is rather difficult. The basic race scheme is
> like:
> 
> fallocate(FALLOC_FL_PUNCH_HOLE)			read / fault / ..
>   truncate_inode_pages_range()
> 						  <create pages in page
> 						   cache here>
>   <update fs block mapping and free blocks>
> 
> Now the problem is in this way read / page fault / readahead can
> instantiate pages in page cache with potentially stale data (if blocks
> get quickly reused). Avoiding this race is not simple - page locks do
> not work because we want to make sure there are *no* pages in given
> range. inode->i_rwsem does not work because page fault happens under
> mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
> the performance for mixed read-write workloads suffer.
> 
> So create a new rw_semaphore in the inode - i_mapping_sem - that
> protects adding of pages to page cache for page faults / reads /
> readahead.
> 
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> ---
>  fs/inode.c         |  3 +++
>  include/linux/fs.h |  2 ++
>  mm/filemap.c       | 45 +++++++++++++++++++++++++++++++++++++++++++--
>  mm/readahead.c     |  2 ++
>  4 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 6442d97d9a4a..8df49d98e1cd 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -174,6 +174,9 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  
>  	init_rwsem(&inode->i_rwsem);
>  	lockdep_set_class(&inode->i_rwsem, &sb->s_type->i_mutex_key);
> +	init_rwsem(&inode->i_mapping_sem);
> +	lockdep_set_class(&inode->i_mapping_sem,
> +			  &sb->s_type->i_mapping_sem_key);
>  
>  	atomic_set(&inode->i_dio_count, 0);
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b20ddd8a6e62..248609bc61a2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -658,6 +658,7 @@ struct inode {
>  	/* Misc */
>  	unsigned long		i_state;
>  	struct rw_semaphore	i_rwsem;
> +	struct rw_semaphore	i_mapping_sem;
>  
>  	unsigned long		dirtied_when;	/* jiffies of first dirtying */
>  	unsigned long		dirtied_time_when;
> @@ -2249,6 +2250,7 @@ struct file_system_type {
>  
>  	struct lock_class_key i_lock_key;
>  	struct lock_class_key i_mutex_key;
> +	struct lock_class_key i_mapping_sem_key;
>  	struct lock_class_key i_mutex_dir_key;
>  };
>  
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 16a3bf693d4a..02f778ff02e0 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2257,16 +2257,28 @@ static int filemap_update_page(struct kiocb *iocb,
>  {
>  	int error;
>  
> +	if (iocb->ki_flags & IOCB_NOWAIT) {
> +		if (!down_read_trylock(&mapping->host->i_mapping_sem))
> +			return -EAGAIN;
> +	} else {
> +		down_read(&mapping->host->i_mapping_sem);
> +	}
> +
>  	if (!trylock_page(page)) {
> -		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
> +		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
> +			up_read(&mapping->host->i_mapping_sem);
>  			return -EAGAIN;
> +		}
>  		if (!(iocb->ki_flags & IOCB_WAITQ)) {
> +			up_read(&mapping->host->i_mapping_sem);
>  			put_and_wait_on_page_locked(page, TASK_KILLABLE);
>  			return AOP_TRUNCATED_PAGE;
>  		}
>  		error = __lock_page_async(page, iocb->ki_waitq);
> -		if (error)
> +		if (error) {
> +			up_read(&mapping->host->i_mapping_sem);
>  			return error;
> +		}
>  	}

What tree is this against? I don't see filemap_update_page() in a
5.11-rc7 tree...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux