On Sat, May 07, 2022 at 02:52:24PM -0700, Andrew Morton wrote: > On Mon, 2 May 2022 00:01:46 -0700 Andrei Vagin <avagin@xxxxxxxxx> wrote: > > > Andrew, could you take a look at this patch? > > > > Here is a small reproducer for the problem: > > > > #define _GNU_SOURCE /* See feature_test_macros(7) */ > > #include <fcntl.h> > > #include <stdio.h> > > #include <unistd.h> > > #include <errno.h> > > #include <sys/stat.h> > > #include <sys/types.h> > > #include <sys/sendfile.h> > > > > > > #define FILE_SIZE (1UL << 30) > > int main(int argc, char **argv) { > > int p[2], fd; > > > > if (pipe2(p, O_NONBLOCK)) > > return 1; > > > > fd = open(argv[1], O_RDWR | O_TMPFILE, 0666); > > if (fd < 0) > > return 1; > > ftruncate(fd, FILE_SIZE); > > > > if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) { > > fprintf(stderr, "FAIL\n"); > > } > > if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) { > > fprintf(stderr, "FAIL\n"); > > } > > return 0; > > } > > > > It worked before b964bf53e540, it is stuck after b964bf53e540, and it > > works again with this fix. > > Thanks. How did b964bf53e540 cause this? do_splice_direct() > accidentally does the right thing even when SPLICE_F_NONBLOCK was not > passed? do_splice_direct() calls pipe_write that handles O_NONBLOCK. Here is a trace log from the reproducer: 1) | __x64_sys_sendfile64() { 1) | do_sendfile() { 1) | __fdget() 1) | rw_verify_area() 1) | __fdget() 1) | rw_verify_area() 1) | do_splice_direct() { 1) | rw_verify_area() 1) | splice_direct_to_actor() { 1) | do_splice_to() { 1) | rw_verify_area() 1) | generic_file_splice_read() 1) + 74.153 us | } 1) | direct_splice_actor() { 1) | iter_file_splice_write() { 1) | __kmalloc() 1) 0.148 us | pipe_lock(); 1) 0.153 us | splice_from_pipe_next.part.0(); 1) 0.162 us | page_cache_pipe_buf_confirm(); ... 16 times 1) 0.159 us | page_cache_pipe_buf_confirm(); 1) | vfs_iter_write() { 1) | do_iter_write() { 1) | rw_verify_area() 1) | do_iter_readv_writev() { 1) | pipe_write() { 1) | mutex_lock() 1) 0.153 us | mutex_unlock(); 1) 1.368 us | } 1) 1.686 us | } 1) 5.798 us | } 1) 6.084 us | } 1) 0.174 us | kfree(); 1) 0.152 us | pipe_unlock(); 1) + 14.461 us | } 1) + 14.783 us | } 1) 0.164 us | page_cache_pipe_buf_release(); ... 16 times 1) 0.161 us | page_cache_pipe_buf_release(); 1) | touch_atime() 1) + 95.854 us | } 1) + 99.784 us | } 1) ! 107.393 us | } 1) ! 107.699 us | } > > I assume that Al will get to this. Meanwhile I can toss it > into linux-next to get some exposure and so it won't be lost. >