On Fri, Dec 15, 2023 at 08:47:15AM -0700, Jens Axboe wrote: > On 12/14/23 11:45 AM, Ahelenia Ziemiańska wrote: > > We request non-blocking I/O in the generic implementation, but some > > files ‒ ttys ‒ only check O_NONBLOCK. Refuse them here, lest we > > risk sleeping with the pipe locked for indeterminate lengths of > > time. > A worthy goal here is ensuring that _everybody_ honors IOCB_NOWAIT, > rather than just rely on O_NONBLOCK. This does involve converting to > ->read_iter/->write_iter if the driver isn't already using it, but some > of them already have that, yet don't check IOCB_NOWAIT or treat it the > same as O_NONBLOCK. This doesn't really mean much to me, sorry. > Adding special checks like this is not a good idea, imho. That's what Linus said I should do so that's what I did https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@xxxxxxxxxxxxxx/ I can't fix the tty layer :/ > > This also masks inconsistent wake-ups (usually every second line) > > when splicing from ttys in icanon mode. > > > > Regular files don't /have/ a distinct O_NONBLOCK mode, > > because they always behave non-blockingly, and for them FMODE_NOWAIT is > > used in the purest sense of > > /* File is capable of returning -EAGAIN if I/O will block */ > > which is not set by the vast majority of filesystems, > > and it's not the semantic we want here. > > The main file systems do very much set it, like btrfs, ext4, and xfs. If > you look at total_file_systems / ones_flagging_it the ratio may be high, > but in terms of installed userbase, the majority definitely will have > it. Also see comment on cover letter for addressing this IOCB_NOWAIT > confusion. Reassessing [1] https://lore.kernel.org/linux-fsdevel/5osglsw36dla3mubtpsmdwdid4fsdacplyd6acx2igo4atogdg@yur3idyim3cc/ I see FMODE_NOWAIT in blockdevs /dev/{null,zero,random,urandom} btrfs/ext4/f2fs/ocfs2/xfs eventfd pipes sockets tun/tap which means that vfat/fuse/nfs/tmpfs/ramfs/procfs/sysfs don't. (zfs also doesn't, but that's not for this list.) I don't know if that's actually a "majority" in a meaningful sense, I agree, but I think I primarily committed to this exclusion because tmpfs/ramfs didn't. I s'pose ramfs can already be tagged since it already returns -EAGAIN when I/O would block (never). tmpfs not being spliceable is also questionable. But this'd also mean effectively deleting afs_file_splice_read ceph_splice_read coda_file_splice_read ecryptfs_splice_read_update_atime fuse_dev_splice_read nfs_file_splice_read orangefs_file_splice_read shmem_file_splice_read v9fs_file_splice_read (not to mention the many others (adfs/affs/bfs/bcachefs/cramfs/erofs/fat/hfs*/hostfs/hpfs/jffs2/jfs/minix/nilfs/ntfs/omfs/reiserfs/isofs/sysv/ubifs/udf/ufs/vboxsf/squashfs/romfs) which just use the filemap impl verbatim). There's no point to restricting splice access on a per-filesystem level (which this'd do), since to mount a malicious network filesystem you need to be root. A denial of service attack makes no sense if you're already root. (Maybe except for fuse, which people typically run suid; that I could see potentially making sense to disable..) I have indeed managed to confuse myself into the NOWAIT hole, but this is actually about "not letting unprivileged users escalate into hanging system daemons by writing to a pipe" rather than "if we ever hold the pipe lock for >2µs we die instantly". O_NONBLOCK filtered by FMODE_NOWAIT is used as a semantic proxy for the 10 different types of files anyone can create that we know are safe. Anyone can open a socket and not write to it, so we must refuse to splice from a socket with no data in it. But only root can mount filesystems, so a regular file is always safe. And, actually defining a slightly-heuristic per-file policy in the syscall itself is stupid, you've talked me out of this. This check only actually applies to the generic copy_splice_read() implementation, since the "real"/non-generic splices (fiemap_splice_read/per-filesystem ‒ all the others that this patchset touches) are already known to be safe (and aren't reads so FMODE_NOWAIT doesn't factor in at all). I've dropped this patch and have instead added this to 01/11: diff --git a/fs/splice.c b/fs/splice.c index f8bfc9cf8cdc..6d369d7d56d5 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -331,0 +332,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos, + /* + * This generic implementation is only safe w.r.t. the pipe lock + * if the file actually respects IOCB_NOWAIT, which ttys don't. + */ + if (!(in->f_mode & FMODE_NOWAIT)) + return -EINVAL; (Indeed, in many ways, Linus' post to which I reply in [1] pretty much says this explcitly. Actually he literally says this. I just don't realise and instead of adding the snippet to copy_splice_read(), which he already diffed and talks about, I copied it to the syscall.) Now I just need to re-consider the prose in a way that avoids this deeply embarrassing IOCB_NOWAIT/regular-file nonsense, and this series ends up being just "fixing splice implementations" without also special-casing the syscall itself. Thanks for asking the right questions. Sorry for longposting.
Attachment:
signature.asc
Description: PGP signature