Re: Bug with splice to a pipe preventing a process exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 22, 2025 at 5:31 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Tue, Jan 21, 2025 at 06:08:41PM -0800, Kir Kolyshkin wrote:
> > While checking if the tool I'm co-maintaining [1] works OK when compiled
> > with the future release of golang (1.24, tested with go1.24rc2), I found
> > out it's not [2], and the issue is caused by Go using sendfile more [3].
> >
> > I came up with the following simple reproducer:
> >
> > #define _GNU_SOURCE
> > #include <fcntl.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/sendfile.h>
> > #include <sys/wait.h>
> > #include <sys/socket.h>
> > #include <sys/un.h>
> >
> > int main() {
> >       int sks[2];
> >       int pipefd[2];
> >       if (pipe(pipefd) == -1) {
> >               perror("pipe");
> >               exit(1);
> >       }
> >
> >       pid_t pid = fork();
> >       if (pid == -1) {
> >               perror("fork");
> >               exit(1);
> >       }
> >
> >       if (pid == 0) {
> >               // Child process.
> >               close(pipefd[1]); // Close write end.
> >
> >               // Minimal process that just exits after some time.
> >               sleep(1);
> >
> >               _exit(0); // <-- The child hangs here.
> >       }
> >
> >       // Parent process.
> >       close(pipefd[0]);  // Close read end.
> >
> >       printf("PID1=%d\n", getpid());
> >       printf("PID2=%d\n", pid);
> >       printf("ps -f  -p $PID1,$PID2\n");
> >       printf("sudo tail /proc/{$PID1,$PID2}/{stack,syscall}\n");
> >
> > #ifdef TEST_USE_STDIN
> >       int in_fd = STDIN_FILENO;
> > #else
> >       socketpair(AF_UNIX, SOCK_STREAM, 0, sks);
> >       int in_fd = sks[0];
> > #endif
> >       // Copy from in_fd to pipe.
> >       ssize_t ret = sendfile(pipefd[1], in_fd, 0, 1 << 22);
> >       if (ret == -1) {
> >               perror("sendfile");
> >       }
> >
> >       // Wait for child
> >       int status;
> >       waitpid(pid, &status, 0);
> >
> >       close(pipefd[1]); // Close write end.
> >       return 0;
> > }
> >
> > To reproduce, compile and run the above code, and when it hangs (instead
> > of exiting), copy its output to a shell in another terminal. Here's what
> > I saw:
> >
> > [kir@kir-tp1 linux]$ PID1=2174401
> > PID2=2174402
> > ps -f  -p $PID1,$PID2
> > sudo tail /proc/{$PID1,$PID2}/{stack,syscall}
> > UID          PID    PPID  C STIME TTY          TIME CMD
> > kir      2174401   63304  0 17:34 pts/1    00:00:00 ./repro
> > kir      2174402 2174401  0 17:34 pts/1    00:00:00 [repro]
> > ==> /proc/2174401/stack <==
> > [<0>] unix_stream_read_generic+0x792/0xc90
> > [<0>] unix_stream_splice_read+0x6f/0xb0
> > [<0>] splice_file_to_pipe+0x65/0xd0
> > [<0>] do_sendfile+0x176/0x440
> > [<0>] __x64_sys_sendfile64+0xb3/0xd0
> > [<0>] do_syscall_64+0x82/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > ==> /proc/2174401/syscall <==
> > 40 0x4 0x3 0x0 0x400000 0x64 0xfffffff9 0x7fff2ab3fc58 0x7f265ed6ca3e
> >
> > ==> /proc/2174402/stack <==
> > [<0>] pipe_release+0x1f/0x100
> > [<0>] __fput+0xde/0x2a0
> > [<0>] task_work_run+0x59/0x90
> > [<0>] do_exit+0x309/0xab0
> > [<0>] do_group_exit+0x30/0x80
> > [<0>] __x64_sys_exit_group+0x18/0x20
> > [<0>] x64_sys_call+0x14b4/0x14c0
> > [<0>] do_syscall_64+0x82/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > ==> /proc/2174402/syscall <==
> > 231 0x0 0xffffffffffffff88 0xe7 0x0 0x0 0x7f265eea01a0 0x7fff2ab3fc58 0x7f265ed43acd
> >
> > Presumably, what happens here is the child process is stuck in the
> > exit_group syscall, being blocked by parent's splice which holds the
> > lock to the pipe (in splice_file_to_pipe).
>
> Splice is notoriously problematic when interacting with pipes due to how
> it holds the pipe lock. We've had handwavy discussions how to improve
> this but nothing ever materialized.
>
> The gist here seems to me that unix_stream_read_generic() is waiting on
> data to read from the write-side of the socketpair(). Until you close
> that fd or provide data you'll simply hang forever.

My thinking is splice should also return upon closing the other end of
the pipe it should writes to (i.e. pipefd[0] which the child is supposed
to read from), as the pipe consumer is gone. The program above
does just that -- it tries to close the end of the pipe it's supposed to
read from (implicitly, upon exit) -- alas, the close is blocked by that
very splice.

The parent could also close the reading fd (in_fd), as per your
suggestion, but it makes sense to do so only once the child has
exited -- and again, it's not possible because splice causes it
to block during exit, and so the parent couldn't know the child
is done. Surely, we can add another mechanism so the child
can tell the parent that it's about done.

In real life, the parent is "runc exec" and the child is any binary
executed in a container. Splice is used to forward data from
runc's stdin to that container process. It looks like it's impossible
to use splice in this scenario.


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux