Re: Pending splice(file -> FIFO) excludes all other FIFO operations forever (was: ... always blocks read(FIFO), regardless of O_NONBLOCK on read side?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 08, 2023 at 01:06:56PM -0700, Linus Torvalds wrote:
> On Fri, 7 Jul 2023 at 17:30, Ahelenia Ziemiańska
> <nabijaczleweli@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Same reproducer, backtrace attached:
> > $ scripts/faddr2line vmlinux splice_from_pipe_next+0x6e
> > splice_from_pipe_next+0x6e/0x180:
> > pipe_buf_confirm at include/linux/pipe_fs_i.h:233
> Bah. I should have looked more closely at your case.
> 
> This is a buffer without an 'ops' pointer. So it looks like was
> already released.
> 
> And the reason is that the pipe was readable because there were no
> writers, and I had put the
> 
>         if (!pipe->writers)
>                 return 0;
> 
> check in splice_from_pipe_next() in the wrong place. It needs to be
> *before* the eat_empty_buffer() call.
>
> Anyway, while I think that fixes your NULL pointer thing,
It does.

> So while fixing your NULL pointer check should be trivial, I think
> that first patch is actually fundamentally broken wrt pipe resizing,
> and I see no really sane way to fix it. We could add a new lock just
> for that, but I don't think it's worth it.
> 
> > You are, but, well, that's also the case when the pipe is full.
> > As it stands, the pipe is /empty/ and yet /no-one can write to it/.
> > This is the crux of the issue at hand.
> No, I think you are mis-representing things. The pipe isn't empty.
> It's full of things that just aren't finalized yet.
Being full of no data (as part of some hidden state)
doesn't make it any less empty, but meh; neither here not there.

> > Or, rather: splice() from a non-seekable (non-mmap()pable?)
> Please stop with the non-seekable nonsense.
> 
> Any time I see a patch like this:
> 
> > +               if (!(in->f_mode & FMODE_LSEEK))
> > +                       return -EINVAL;
> 
> I will just go "that person is not competent".
Accurate assessment.

> This has absolutely nothing to do with seekability.
Yes, and as noted, I was using it as a stand-in for "I/O won't
block" due to the above (and splice_direct_to_actor() already uses
it). Glad to see you've managed to synthesise my drivel into something
workable.

> But it is possible that we need to just bite the bullet and say
> "copy_splice_read() needs to use a non-blocking kiocb for the IO".
> 
> Of course, that then doesn't work, because while doing this is trivial:
> 
>   --- a/fs/splice.c
>   +++ b/fs/splice.c
>   @@ -364,6 +364,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
>         iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
>         init_sync_kiocb(&kiocb, in);
>         kiocb.ki_pos = *ppos;
>   +     kiocb.ki_flags |= IOCB_NOWAIT;
>         ret = call_read_iter(in, &kiocb, &to);
> 
>         if (ret > 0) {
> 
> I suspectr you'll find that it makes no difference, because the tty
> layer doesn't actually honor the IOCB_NOWAIT flag for various
> historical reasons.
Indeed, neither when splicing from a tty,
nor from a socket (same setup but socketpair(AF_UNIX, SOCK_STREAM, 0); w.c).

> In fact, the kiocb isn't even passed down to the
> low-level routine, which only gets the 'struct file *', and instead it
> looks at tty_io_nonblock(), which just does that legacy
> 
>         file->f_flags & O_NONBLOCK
> 
> test.
> 
> I guess combined with something like
> 
>         if (!(in->f_mode & FMODE_NOWAIT))
>                 return -EINVAL;
> 
> it might all work.
Yes, that makes the splice instantly -EINVAL for ttys, but doesn't
affect the socketpair() case above, which still blocks forever.

This appears to be because vfs_splice_read() does
	if ((in->f_flags & O_DIRECT) || IS_DAX(in->f_mapping->host))
		return copy_splice_read(in, ppos, pipe, len, flags);
	return in->f_op->splice_read(in, ppos, pipe, len, flags);
so the c_s_r() isn't even called for the socket, which uses
unix_stream_splice_read().

Thus,
  diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
  index 123b35ddfd71..384d5a479b4a 100644
  --- a/net/unix/af_unix.c
  +++ b/net/unix/af_unix.c
  @@ -2880,15 +2880,12 @@ static ssize_t unix_stream_splice_read(struct socket *sock,  loff_t *ppos,
                  .pipe = pipe,
                  .size = size,
                  .splice_flags = flags,
  +               .flags = MSG_DONTWAIT,
          };
  
          if (unlikely(*ppos))
                  return -ESPIPE;
  
  -       if (sock->file->f_flags & O_NONBLOCK ||
  -           flags & SPLICE_F_NONBLOCK)
  -               state.flags = MSG_DONTWAIT;
  -
          return unix_stream_read_generic(&state, false);
   }
makes the splice -EAGAIN w/o data and splice whatever's in the socket
when there is data.

git grep '\.splice_read.*=' | cut -d= -f2 | tr -s ',;[:space:]' '\n' | sort -u
gives me 27 distinct splice_read implementations and 1 cocci config.

These are simple delegations:
  nfs_file_splice_read               filemap_splice_read
  afs_file_splice_read               filemap_splice_read
  ceph_splice_read                   filemap_splice_read
  ecryptfs_splice_read_update_atime  filemap_splice_read
  ext4_file_splice_read              filemap_splice_read
  f2fs_file_splice_read              filemap_splice_read
  ntfs_file_splice_read              filemap_splice_read
  ocfs2_file_splice_read             filemap_splice_read
  orangefs_file_splice_read          filemap_splice_read
  v9fs_file_splice_read              filemap_splice_read
  xfs_file_splice_read               filemap_splice_read
  zonefs_file_splice_read            filemap_splice_read
  sock_splice_read                   copy_splice_read or a socket-specific one
  coda_file_splice_read              vfs_splice_read
  ovl_splice_read                    vfs_splice_read

vfs_splice_read calls copy_splice_read (not used as a .splice_read).

And the rest are:
01. copy_splice_read you've fixed
02. filemap_splice_read is, as I understand it, only applicable to
    files/blockdevs and already has the required semantics?

03. unix_stream_splice_read can be fixed as above

04. fuse_dev_splice_read allocates a buffer and fuse_dev_do_read()s into
    it, fuse_dev_do_read does
                 if (file->f_flags & O_NONBLOCK)
                         return -EAGAIN;
    so this is likely a similarly easy fix
05. tracing_buffers_splice_read, when it doesn't read anything does
                 ret = -EAGAIN;
                 if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
                         goto out;
    and waits for at least one thing to read;
    can probably just goto out instantly
06. tracing_splice_read_pipe delegates to whatever it's tracing, and
    errors if that errored, so it's fine(?)

07. shmem_file_splice_read is always nonblocking
08. relay_file_splice_read only checks SPLICE_F_NONBLOCK to turn 0 into
    -EAGAIN so I think it also just doesn't block

09. smc_splice_read falls back to an embedded socket's splice_read,
    or uses 
                if (flags & SPLICE_F_NONBLOCK)
                        flags = MSG_DONTWAIT;
                else
                        flags = 0;
                SMC_STAT_INC(smc, splice_cnt);
                rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags);
    so also probably remove the condition
10. kcm_splice_read:
10a. passes flags (SPLICE_F_...) to skb_recv_datagram(), which does
        timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
     verbatim!? and forwards them to try_recv which also checks
     for socket-style bits
10b. give it MSG_DONTWAIT, call it a day?
10c. it also passes flags to skb_splice_bits() to actually splice to the
     pipe, but that discards flags, so no change needed

11. tls_sw_splice_read I don't really understand but it does
        err = tls_rx_reader_lock(sk, ctx, flags & SPLICE_F_NONBLOCK);
    and
                err = tls_rx_rec_wait(sk, NULL, flags & SPLICE_F_NONBLOCK,
                                      true);
    (and skb_splice_bits()) so make them both true?

12. tcp_splice_read uses skb_splice_bits() and timeout governed by
        timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
    and re-tries on empty read if timeo!=0 (i.e. !(sock->file->f_flags & O_NONBLOCK));
    so turning that into true (timeo = 0) and propagating that would
	make it behave accd'g to the new semantic

Would that make sense?
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdlib.h>
#include <sys/socket.h>

int main() {
  int sp[2];
  socketpair(AF_UNIX, SOCK_STREAM, 0, sp);
  for(;;)
    splice(sp[0], 0, 1, 0, 128 * 1024 * 1024, 0);
}

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux