Re: [PATCH RERESEND 10/11] splice: file->pipe: -EINVAL for non-regular files w/o FMODE_NOWAIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 15, 2023 at 08:47:15AM -0700, Jens Axboe wrote:
> On 12/14/23 11:45 AM, Ahelenia Ziemiańska wrote:
> > We request non-blocking I/O in the generic implementation, but some
> > files ‒ ttys ‒ only check O_NONBLOCK. Refuse them here, lest we
> > risk sleeping with the pipe locked for indeterminate lengths of
> > time.
> A worthy goal here is ensuring that _everybody_ honors IOCB_NOWAIT,
> rather than just rely on O_NONBLOCK. This does involve converting to
> ->read_iter/->write_iter if the driver isn't already using it, but some
> of them already have that, yet don't check IOCB_NOWAIT or treat it the
> same as O_NONBLOCK.
This doesn't really mean much to me, sorry.

> Adding special checks like this is not a good idea, imho.
That's what Linus said I should do so that's what I did
  https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@xxxxxxxxxxxxxx/

I can't fix the tty layer :/

> > This also masks inconsistent wake-ups (usually every second line)
> > when splicing from ttys in icanon mode.
> > 
> > Regular files don't /have/ a distinct O_NONBLOCK mode,
> > because they always behave non-blockingly, and for them FMODE_NOWAIT is
> > used in the purest sense of
> >   /* File is capable of returning -EAGAIN if I/O will block */
> > which is not set by the vast majority of filesystems,
> > and it's not the semantic we want here.
> 
> The main file systems do very much set it, like btrfs, ext4, and xfs. If
> you look at total_file_systems / ones_flagging_it the ratio may be high,
> but in terms of installed userbase, the majority definitely will have
> it. Also see comment on cover letter for addressing this IOCB_NOWAIT
> confusion.
Reassessing
[1] https://lore.kernel.org/linux-fsdevel/5osglsw36dla3mubtpsmdwdid4fsdacplyd6acx2igo4atogdg@yur3idyim3cc/
I see FMODE_NOWAIT in
  blockdevs
  /dev/{null,zero,random,urandom}
  btrfs/ext4/f2fs/ocfs2/xfs
  eventfd
  pipes
  sockets
  tun/tap
which means that vfat/fuse/nfs/tmpfs/ramfs/procfs/sysfs don't.
(zfs also doesn't, but that's not for this list.)

I don't know if that's actually a "majority" in a meaningful sense,
I agree, but I think I primarily committed to this exclusion because
tmpfs/ramfs didn't.

I s'pose ramfs can already be tagged since it already returns
-EAGAIN when I/O would block (never).

tmpfs not being spliceable is also questionable.

But this'd also mean effectively deleting
  afs_file_splice_read
  ceph_splice_read
  coda_file_splice_read
  ecryptfs_splice_read_update_atime
  fuse_dev_splice_read
  nfs_file_splice_read
  orangefs_file_splice_read
  shmem_file_splice_read
  v9fs_file_splice_read
(not to mention the many others (adfs/affs/bfs/bcachefs/cramfs/erofs/fat/hfs*/hostfs/hpfs/jffs2/jfs/minix/nilfs/ntfs/omfs/reiserfs/isofs/sysv/ubifs/udf/ufs/vboxsf/squashfs/romfs)
 which just use the filemap impl verbatim).

There's no point to restricting splice access on a per-filesystem level
(which this'd do), since to mount a malicious network filesystem you
need to be root.

A denial of service attack makes no sense if you're already root.

(Maybe except for fuse, which people typically run suid;
 that I could see potentially making sense to disable..)


I have indeed managed to confuse myself into the NOWAIT hole,
but this is actually about
"not letting unprivileged users escalate into
 hanging system daemons by writing to a pipe"
rather than
"if we ever hold the pipe lock for >2µs we die instantly".

O_NONBLOCK filtered by FMODE_NOWAIT is used as a semantic proxy for
the 10 different types of files anyone can create that we know are safe.

Anyone can open a socket and not write to it, so we must refuse to
splice from a socket with no data in it.
But only root can mount filesystems, so a regular file is always safe.

And, actually defining a slightly-heuristic per-file policy in the syscall
itself is stupid, you've talked me out of this.
This check only actually applies to the generic copy_splice_read()
implementation, since the "real"/non-generic splices
(fiemap_splice_read/per-filesystem ‒
 all the others that this patchset touches)
are already known to be safe
(and aren't reads so FMODE_NOWAIT doesn't factor in at all).

I've dropped this patch and have instead added this to 01/11:
  diff --git a/fs/splice.c b/fs/splice.c
  index f8bfc9cf8cdc..6d369d7d56d5 100644
  --- a/fs/splice.c
  +++ b/fs/splice.c
  @@ -331,0 +332,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
  +	/*
  +	 * This generic implementation is only safe w.r.t. the pipe lock
  +	 * if the file actually respects IOCB_NOWAIT, which ttys don't.
  +	 */
  +	if (!(in->f_mode & FMODE_NOWAIT))
  +		return -EINVAL;

(Indeed, in many ways, Linus' post to which I reply in [1] pretty much
 says this explcitly. Actually he literally says this. I just don't
 realise and instead of adding the snippet to copy_splice_read(),
 which he already diffed and talks about, I copied it to the syscall.)

Now I just need to re-consider the prose in a way
that avoids this deeply embarrassing IOCB_NOWAIT/regular-file nonsense,
and this series ends up being just "fixing splice implementations"
without also special-casing the syscall itself.

Thanks for asking the right questions.
Sorry for longposting.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux