Re: [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 09-05-23 12:56:31, Kent Overstreet wrote:
> From: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> 
> This is used by bcachefs to fix a page cache coherency issue with
> O_DIRECT writes.
> 
> Also relevant: mapping->invalidate_lock, see below.
> 
> O_DIRECT writes (and other filesystem operations that modify file data
> while bypassing the page cache) need to shoot down ranges of the page
> cache - and additionally, need locking to prevent those pages from
> pulled back in.
> 
> But O_DIRECT writes invoke the page fault handler (via get_user_pages),
> and the page fault handler will need to take that same lock - this is a
> classic recursive deadlock if userspace has mmaped the file they're DIO
> writing to and uses those pages for the buffer to write from, and it's a
> lock ordering deadlock in general.
> 
> Thus we need a way to signal from the dio code to the page fault handler
> when we already are holding the pagecache add lock on an address space -
> this patch just adds a member to task_struct for this purpose. For now
> only bcachefs is implementing this locking, though it may be moved out
> of bcachefs and made available to other filesystems in the future.

It would be nice to have at least a link to the code that's actually using
the field you are adding.

Also I think we were already through this discussion [1] and we ended up
agreeing that your scheme actually solves only the AA deadlock but a
malicious userspace can easily create AB BA deadlock by running direct IO
to file A using mapped file B as a buffer *and* direct IO to file B using
mapped file A as a buffer.

[1] https://lore.kernel.org/all/20191218124052.GB19387@xxxxxxxxxxxxxx

> ---------------------------------
> 
> The closest current VFS equivalent is mapping->invalidate_lock, which
> comes from XFS. However, it's not used by direct IO.  Instead, direct IO
> paths shoot down the page cache twice - before starting the IO and at
> the end, and they're still technically racy w.r.t. page cache coherency.
> 
> This is a more complete approach: in the future we might consider
> replacing mapping->invalidate_lock with the bcachefs code.

Yes, and this is because we never provided 100% consistent buffered VS
direct IO behavior on the same file exactly because we never found the
complexity worth the usefulness...

								Honza

> 
> Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> Cc: Jan Kara <jack@xxxxxxx>
> Cc: Darrick J. Wong <djwong@xxxxxxxxxx>
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> ---
>  include/linux/sched.h | 1 +
>  init/init_task.c      | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 63d242164b..f2a56f64f7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -869,6 +869,7 @@ struct task_struct {
>  
>  	struct mm_struct		*mm;
>  	struct mm_struct		*active_mm;
> +	struct address_space		*faults_disabled_mapping;
>  
>  	int				exit_state;
>  	int				exit_code;
> diff --git a/init/init_task.c b/init/init_task.c
> index ff6c4b9bfe..f703116e05 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -85,6 +85,7 @@ struct task_struct init_task
>  	.nr_cpus_allowed= NR_CPUS,
>  	.mm		= NULL,
>  	.active_mm	= &init_mm,
> +	.faults_disabled_mapping = NULL,
>  	.restart_block	= {
>  		.fn = do_no_restart_syscall,
>  	},
> -- 
> 2.40.1
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux