On Tue, Jan 21, 2025 at 06:08:41PM -0800, Kir Kolyshkin wrote: > While checking if the tool I'm co-maintaining [1] works OK when compiled > with the future release of golang (1.24, tested with go1.24rc2), I found > out it's not [2], and the issue is caused by Go using sendfile more [3]. > > I came up with the following simple reproducer: > > #define _GNU_SOURCE > #include <fcntl.h> > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <sys/sendfile.h> > #include <sys/wait.h> > #include <sys/socket.h> > #include <sys/un.h> > > int main() { > int sks[2]; > int pipefd[2]; > if (pipe(pipefd) == -1) { > perror("pipe"); > exit(1); > } > > pid_t pid = fork(); > if (pid == -1) { > perror("fork"); > exit(1); > } > > if (pid == 0) { > // Child process. > close(pipefd[1]); // Close write end. > > // Minimal process that just exits after some time. > sleep(1); > > _exit(0); // <-- The child hangs here. > } > > // Parent process. > close(pipefd[0]); // Close read end. > > printf("PID1=%d\n", getpid()); > printf("PID2=%d\n", pid); > printf("ps -f -p $PID1,$PID2\n"); > printf("sudo tail /proc/{$PID1,$PID2}/{stack,syscall}\n"); > > #ifdef TEST_USE_STDIN > int in_fd = STDIN_FILENO; > #else > socketpair(AF_UNIX, SOCK_STREAM, 0, sks); > int in_fd = sks[0]; > #endif > // Copy from in_fd to pipe. > ssize_t ret = sendfile(pipefd[1], in_fd, 0, 1 << 22); > if (ret == -1) { > perror("sendfile"); > } > > // Wait for child > int status; > waitpid(pid, &status, 0); > > close(pipefd[1]); // Close write end. > return 0; > } > > To reproduce, compile and run the above code, and when it hangs (instead > of exiting), copy its output to a shell in another terminal. Here's what > I saw: > > [kir@kir-tp1 linux]$ PID1=2174401 > PID2=2174402 > ps -f -p $PID1,$PID2 > sudo tail /proc/{$PID1,$PID2}/{stack,syscall} > UID PID PPID C STIME TTY TIME CMD > kir 2174401 63304 0 17:34 pts/1 00:00:00 ./repro > kir 2174402 2174401 0 17:34 pts/1 00:00:00 [repro] > ==> /proc/2174401/stack <== > [<0>] unix_stream_read_generic+0x792/0xc90 > [<0>] unix_stream_splice_read+0x6f/0xb0 > [<0>] splice_file_to_pipe+0x65/0xd0 > [<0>] do_sendfile+0x176/0x440 > [<0>] __x64_sys_sendfile64+0xb3/0xd0 > [<0>] do_syscall_64+0x82/0x160 > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > ==> /proc/2174401/syscall <== > 40 0x4 0x3 0x0 0x400000 0x64 0xfffffff9 0x7fff2ab3fc58 0x7f265ed6ca3e > > ==> /proc/2174402/stack <== > [<0>] pipe_release+0x1f/0x100 > [<0>] __fput+0xde/0x2a0 > [<0>] task_work_run+0x59/0x90 > [<0>] do_exit+0x309/0xab0 > [<0>] do_group_exit+0x30/0x80 > [<0>] __x64_sys_exit_group+0x18/0x20 > [<0>] x64_sys_call+0x14b4/0x14c0 > [<0>] do_syscall_64+0x82/0x160 > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > ==> /proc/2174402/syscall <== > 231 0x0 0xffffffffffffff88 0xe7 0x0 0x0 0x7f265eea01a0 0x7fff2ab3fc58 0x7f265ed43acd > > Presumably, what happens here is the child process is stuck in the > exit_group syscall, being blocked by parent's splice which holds the > lock to the pipe (in splice_file_to_pipe). Splice is notoriously problematic when interacting with pipes due to how it holds the pipe lock. We've had handwavy discussions how to improve this but nothing ever materialized. The gist here seems to me that unix_stream_read_generic() is waiting on data to read from the write-side of the socketpair(). Until you close that fd or provide data you'll simply hang forever. Similar with STDIN_FILENO fwiw. If you never enter any character you simply hang forever waiting for input. So imho the way the program is written is buggy. But Jens might be able to provide more details. > > To me, the code above looks valid, and the kernel behavior seems to > be a bug. In particular, if the process is exiting, the pipe it was > using is now being closed, and splice (or sendfile) should return. > > If this is not a kernel bug, then the code above is not correct; in > this case, please suggest how to fix it. > > Regards, > Kir. > > ---- > [1]: https://github.com/opencontainers/runc > [2]: https://github.com/opencontainers/runc/pull/4598 > [3]: https://go-review.googlesource.com/c/go/+/603295