On Fri, Jul 07, 2023 at 12:10:36PM -0700, Linus Torvalds wrote: > On Fri, 7 Jul 2023 at 10:21, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > Forgot to say, fwiw, I've been running this through the LTP splice, > > pipe, and ipc tests without issues. A hanging reader can be signaled > > away cleanly with this. > NOTE! NOTE! NOTE! Once more, this "feels right to me", and I'd argue > that the basic approach is fairly straightfoward. The patch is also > not horrendous. It all makes a fair amount of sense. BUT! I haven't > tested this, and like the previous patch, I really would want people > to think about this a lot. > > Comments? Jens? I applied the patch upthread + this diff to 4f6b6c2b2f86b7878a770736bf478d8a263ff0bc; during test setup I got a null deref (building defconfig minus graphics). Reproducible, full BUG dump attached; trace of [ 149.878931] <TASK> [ 149.879533] ? __die+0x1e/0x60 [ 149.880309] ? page_fault_oops+0x17c/0x470 [ 149.881313] ? search_module_extables+0x14/0x50 [ 149.882422] ? exc_page_fault+0x67/0x150 [ 149.883397] ? asm_exc_page_fault+0x26/0x30 [ 149.884426] ? __pfx_pipe_to_null+0x10/0x10 [ 149.885451] ? splice_from_pipe_next+0x129/0x150 [ 149.886580] __splice_from_pipe+0x39/0x1c0 [ 149.887594] ? __pfx_pipe_to_null+0x10/0x10 [ 149.888615] ? __pfx_pipe_to_null+0x10/0x10 [ 149.889635] splice_from_pipe+0x5c/0x90 [ 149.890579] do_splice+0x35c/0x840 [ 149.891407] __do_splice+0x1eb/0x210 [ 149.892176] __x64_sys_splice+0xad/0x120 [ 149.893019] do_syscall_64+0x3e/0x90 [ 149.893798] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 $ scripts/faddr2line vmlinux splice_from_pipe_next+0x129 splice_from_pipe_next+0x129/0x150: pipe_buf_release at include/linux/pipe_fs_i.h:221 (inlined by) eat_empty_buffer at fs/splice.c:594 (inlined by) splice_from_pipe_next at fs/splice.c:640 I gamed this down to echo c | grep c >/dev/null where grep is ii grep 3.8-5 amd64 GNU grep, egrep and fgrep and strace of the same invocation (on the host) ends with newfstatat(1, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0 newfstatat(AT_FDCWD, "/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, 0) = 0 newfstatat(0, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0 lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) read(0, "c\n", 98304) = 2 splice(0, NULL, 1, NULL, 98304, SPLICE_F_MOVE) = 0 close(1) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ And can also reproduce it with echo | { read -r _; exec ./wr; } > /dev/null (where ./wr is "while (splice(0, 0, 1, 0, 128 * 1024 * 1024, 0) > 0) {}"). However: echo | ./wr > /dev/null does NOT crash. Besides that, this doesn't solve the original issue, inasmuch as ./v > fifo & head fifo & echo zupa > fifo (where ./v splices from an empty pty to stdout; v.c attached) echo still sleeps until ./v dies, though it also succumbs to ^C now. "OTOH, on 4f6b6c2b2f86b7878a770736bf478d8a263ff0bc, "timeout 10 ./v > fifo &" (then lines 2 and 3 as above) does kill ./v -> unblock echo -> head copies "zupa", i.e. life resumes as normal after the splicer went away. With the patches, echo zupa is stuck forever (until you signal it)! This is kinda worse.
[ 149.843966] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 149.845820] #PF: supervisor read access in kernel mode [ 149.847190] #PF: error_code(0x0000) - not-present page [ 149.848540] PGD 0 P4D 0 [ 149.849231] Oops: 0000 [#1] PREEMPT SMP PTI [ 149.850345] CPU: 0 PID: 230 Comm: grep Not tainted 6.4.0-12317-gabf530ed3e36-dirty #3 [ 149.852411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 149.854900] RIP: 0010:splice_from_pipe_next+0x129/0x150 [ 149.856328] Code: ff c6 45 38 00 eb af 5b b8 00 fe ff ff 5d 41 5c 41 5d c3 cc cc cc cc 48 8b 46 10 41 83 c5 01 48 89 df 48 c7 46 10 00 00 00 00 <48> 8b 40 08 e8 ce a5 9a [ 149.861118] RSP: 0018:ffffb2ed40347d70 EFLAGS: 00010202 [ 149.862488] RAX: 0000000000000000 RBX: ffff8c06c1d9a0c0 RCX: 0000000000000000 [ 149.864357] RDX: 0000000000000005 RSI: ffff8c06c8c98028 RDI: ffff8c06c1d9a0c0 [ 149.866217] RBP: ffffb2ed40347de0 R08: 0000000000000001 R09: ffffffffaa428db0 [ 149.868088] R10: 0000000000018000 R11: 0000000000000000 R12: ffff8c06c2625580 [ 149.869950] R13: 0000000000000002 R14: ffff8c06c1d9a0c0 R15: ffffb2ed40347de0 [ 149.871828] FS: 00007fa5a6b3e740(0000) GS:ffff8c06dd800000(0000) knlGS:0000000000000000 [ 149.873937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 149.875459] CR2: 0000000000000008 CR3: 000000000269a000 CR4: 00000000000006f0 [ 149.877327] Call Trace: [ 149.878931] <TASK> [ 149.879533] ? __die+0x1e/0x60 [ 149.880309] ? page_fault_oops+0x17c/0x470 [ 149.881313] ? search_module_extables+0x14/0x50 [ 149.882422] ? exc_page_fault+0x67/0x150 [ 149.883397] ? asm_exc_page_fault+0x26/0x30 [ 149.884426] ? __pfx_pipe_to_null+0x10/0x10 [ 149.885451] ? splice_from_pipe_next+0x129/0x150 [ 149.886580] __splice_from_pipe+0x39/0x1c0 [ 149.887594] ? __pfx_pipe_to_null+0x10/0x10 [ 149.888615] ? __pfx_pipe_to_null+0x10/0x10 [ 149.889635] splice_from_pipe+0x5c/0x90 [ 149.890579] do_splice+0x35c/0x840 [ 149.891407] __do_splice+0x1eb/0x210 [ 149.892176] __x64_sys_splice+0xad/0x120 [ 149.893019] do_syscall_64+0x3e/0x90 [ 149.893798] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 149.894881] RIP: 0033:0x7fa5a6c49dd3 [ 149.895682] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d 11 18 0d 00 00 49 89 ca 74 14 b8 13 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 74 [ 149.899538] RSP: 002b:00007ffc83d77768 EFLAGS: 00000202 ORIG_RAX: 0000000000000113 [ 149.901116] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa5a6c49dd3 [ 149.902602] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 [ 149.904048] RBP: 0000564d8aaeb000 R08: 0000000000018000 R09: 0000000000000001 [ 149.905439] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000a [ 149.906832] R13: 0000564d8aaeb010 R14: 0000564d8aaeb000 R15: 0000000000000000 [ 149.908239] </TASK> [ 149.908692] Modules linked in: [ 149.909326] CR2: 0000000000000008 [ 149.910050] ---[ end trace 0000000000000000 ]--- [ 149.910986] RIP: 0010:splice_from_pipe_next+0x129/0x150 [ 149.912063] Code: ff c6 45 38 00 eb af 5b b8 00 fe ff ff 5d 41 5c 41 5d c3 cc cc cc cc 48 8b 46 10 41 83 c5 01 48 89 df 48 c7 46 10 00 00 00 00 <48> 8b 40 08 e8 ce a5 9a [ 149.915639] RSP: 0018:ffffb2ed40347d70 EFLAGS: 00010202 [ 149.916589] RAX: 0000000000000000 RBX: ffff8c06c1d9a0c0 RCX: 0000000000000000 [ 149.917877] RDX: 0000000000000005 RSI: ffff8c06c8c98028 RDI: ffff8c06c1d9a0c0 [ 149.919172] RBP: ffffb2ed40347de0 R08: 0000000000000001 R09: ffffffffaa428db0 [ 149.920457] R10: 0000000000018000 R11: 0000000000000000 R12: ffff8c06c2625580 [ 149.921737] R13: 0000000000000002 R14: ffff8c06c1d9a0c0 R15: ffffb2ed40347de0 [ 149.923021] FS: 00007fa5a6b3e740(0000) GS:ffff8c06dd800000(0000) knlGS:0000000000000000 [ 149.924481] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 149.925529] CR2: 0000000000000008 CR3: 000000000269a000 CR4: 00000000000006f0
#define _GNU_SOURCE #include <fcntl.h> #include <stdlib.h> #include <sys/sendfile.h> int main() { int pt = posix_openpt(O_RDWR); grantpt(pt); unlockpt(pt); int cl = open(ptsname(pt), O_RDONLY); for(;;) splice(cl, 0, 1, 0, 128 * 1024 * 1024, 0); // sendfile(1, 0, 0, 128 * 1024 * 1024); }
Attachment:
signature.asc
Description: PGP signature