On Sat, Jul 08, 2023 at 01:06:56PM -0700, Linus Torvalds wrote: > On Fri, 7 Jul 2023 at 17:30, Ahelenia Ziemiańska > <nabijaczleweli@xxxxxxxxxxxxxxxxxx> wrote: > > > > Same reproducer, backtrace attached: > > $ scripts/faddr2line vmlinux splice_from_pipe_next+0x6e > > splice_from_pipe_next+0x6e/0x180: > > pipe_buf_confirm at include/linux/pipe_fs_i.h:233 > Bah. I should have looked more closely at your case. > > This is a buffer without an 'ops' pointer. So it looks like was > already released. > > And the reason is that the pipe was readable because there were no > writers, and I had put the > > if (!pipe->writers) > return 0; > > check in splice_from_pipe_next() in the wrong place. It needs to be > *before* the eat_empty_buffer() call. > > Anyway, while I think that fixes your NULL pointer thing, It does. > So while fixing your NULL pointer check should be trivial, I think > that first patch is actually fundamentally broken wrt pipe resizing, > and I see no really sane way to fix it. We could add a new lock just > for that, but I don't think it's worth it. > > > You are, but, well, that's also the case when the pipe is full. > > As it stands, the pipe is /empty/ and yet /no-one can write to it/. > > This is the crux of the issue at hand. > No, I think you are mis-representing things. The pipe isn't empty. > It's full of things that just aren't finalized yet. Being full of no data (as part of some hidden state) doesn't make it any less empty, but meh; neither here not there. > > Or, rather: splice() from a non-seekable (non-mmap()pable?) > Please stop with the non-seekable nonsense. > > Any time I see a patch like this: > > > + if (!(in->f_mode & FMODE_LSEEK)) > > + return -EINVAL; > > I will just go "that person is not competent". Accurate assessment. > This has absolutely nothing to do with seekability. Yes, and as noted, I was using it as a stand-in for "I/O won't block" due to the above (and splice_direct_to_actor() already uses it). Glad to see you've managed to synthesise my drivel into something workable. > But it is possible that we need to just bite the bullet and say > "copy_splice_read() needs to use a non-blocking kiocb for the IO". > > Of course, that then doesn't work, because while doing this is trivial: > > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -364,6 +364,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos, > iov_iter_bvec(&to, ITER_DEST, bv, npages, len); > init_sync_kiocb(&kiocb, in); > kiocb.ki_pos = *ppos; > + kiocb.ki_flags |= IOCB_NOWAIT; > ret = call_read_iter(in, &kiocb, &to); > > if (ret > 0) { > > I suspectr you'll find that it makes no difference, because the tty > layer doesn't actually honor the IOCB_NOWAIT flag for various > historical reasons. Indeed, neither when splicing from a tty, nor from a socket (same setup but socketpair(AF_UNIX, SOCK_STREAM, 0); w.c). > In fact, the kiocb isn't even passed down to the > low-level routine, which only gets the 'struct file *', and instead it > looks at tty_io_nonblock(), which just does that legacy > > file->f_flags & O_NONBLOCK > > test. > > I guess combined with something like > > if (!(in->f_mode & FMODE_NOWAIT)) > return -EINVAL; > > it might all work. Yes, that makes the splice instantly -EINVAL for ttys, but doesn't affect the socketpair() case above, which still blocks forever. This appears to be because vfs_splice_read() does if ((in->f_flags & O_DIRECT) || IS_DAX(in->f_mapping->host)) return copy_splice_read(in, ppos, pipe, len, flags); return in->f_op->splice_read(in, ppos, pipe, len, flags); so the c_s_r() isn't even called for the socket, which uses unix_stream_splice_read(). Thus, diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 123b35ddfd71..384d5a479b4a 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2880,15 +2880,12 @@ static ssize_t unix_stream_splice_read(struct socket *sock, loff_t *ppos, .pipe = pipe, .size = size, .splice_flags = flags, + .flags = MSG_DONTWAIT, }; if (unlikely(*ppos)) return -ESPIPE; - if (sock->file->f_flags & O_NONBLOCK || - flags & SPLICE_F_NONBLOCK) - state.flags = MSG_DONTWAIT; - return unix_stream_read_generic(&state, false); } makes the splice -EAGAIN w/o data and splice whatever's in the socket when there is data. git grep '\.splice_read.*=' | cut -d= -f2 | tr -s ',;[:space:]' '\n' | sort -u gives me 27 distinct splice_read implementations and 1 cocci config. These are simple delegations: nfs_file_splice_read filemap_splice_read afs_file_splice_read filemap_splice_read ceph_splice_read filemap_splice_read ecryptfs_splice_read_update_atime filemap_splice_read ext4_file_splice_read filemap_splice_read f2fs_file_splice_read filemap_splice_read ntfs_file_splice_read filemap_splice_read ocfs2_file_splice_read filemap_splice_read orangefs_file_splice_read filemap_splice_read v9fs_file_splice_read filemap_splice_read xfs_file_splice_read filemap_splice_read zonefs_file_splice_read filemap_splice_read sock_splice_read copy_splice_read or a socket-specific one coda_file_splice_read vfs_splice_read ovl_splice_read vfs_splice_read vfs_splice_read calls copy_splice_read (not used as a .splice_read). And the rest are: 01. copy_splice_read you've fixed 02. filemap_splice_read is, as I understand it, only applicable to files/blockdevs and already has the required semantics? 03. unix_stream_splice_read can be fixed as above 04. fuse_dev_splice_read allocates a buffer and fuse_dev_do_read()s into it, fuse_dev_do_read does if (file->f_flags & O_NONBLOCK) return -EAGAIN; so this is likely a similarly easy fix 05. tracing_buffers_splice_read, when it doesn't read anything does ret = -EAGAIN; if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; and waits for at least one thing to read; can probably just goto out instantly 06. tracing_splice_read_pipe delegates to whatever it's tracing, and errors if that errored, so it's fine(?) 07. shmem_file_splice_read is always nonblocking 08. relay_file_splice_read only checks SPLICE_F_NONBLOCK to turn 0 into -EAGAIN so I think it also just doesn't block 09. smc_splice_read falls back to an embedded socket's splice_read, or uses if (flags & SPLICE_F_NONBLOCK) flags = MSG_DONTWAIT; else flags = 0; SMC_STAT_INC(smc, splice_cnt); rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags); so also probably remove the condition 10. kcm_splice_read: 10a. passes flags (SPLICE_F_...) to skb_recv_datagram(), which does timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); verbatim!? and forwards them to try_recv which also checks for socket-style bits 10b. give it MSG_DONTWAIT, call it a day? 10c. it also passes flags to skb_splice_bits() to actually splice to the pipe, but that discards flags, so no change needed 11. tls_sw_splice_read I don't really understand but it does err = tls_rx_reader_lock(sk, ctx, flags & SPLICE_F_NONBLOCK); and err = tls_rx_rec_wait(sk, NULL, flags & SPLICE_F_NONBLOCK, true); (and skb_splice_bits()) so make them both true? 12. tcp_splice_read uses skb_splice_bits() and timeout governed by timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK); and re-tries on empty read if timeo!=0 (i.e. !(sock->file->f_flags & O_NONBLOCK)); so turning that into true (timeo = 0) and propagating that would make it behave accd'g to the new semantic Would that make sense?
#define _GNU_SOURCE #include <fcntl.h> #include <stdlib.h> #include <sys/socket.h> int main() { int sp[2]; socketpair(AF_UNIX, SOCK_STREAM, 0, sp); for(;;) splice(sp[0], 0, 1, 0, 128 * 1024 * 1024, 0); }
Attachment:
signature.asc
Description: PGP signature