On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote: > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > >> Hey Josef and nbd list, > >> > >> I had a question about if there are any socket family restrictions for nbd? > > > > In normal circumstances, in userspace, the NBD protocol would only be > > used over AF_UNIX or AF_INET/AF_INET6. > > > > There's a bit of confusion because netlink is used by nbd-client to > > configure the NBD device, setting things like block size and timeouts > > (instead of ioctl which is deprecated). I think you don't mean this > > use of netlink? > > I didn't. It looks like it is just a bad test. > > For the automated test in this thread the test created a AF_NETLINK > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used > for the NBD_DO_IT ioctl. > > I was not sure if the test creator picked any old socket and it just > happened to pick one nbd never supported, or it was trying to simulate > sockets that did not support the shutdown method. > > I attached the automated test that got run (test.c). I'd say it sounds like a bad test, but I'm not familiar with syzkaller nor how / from where it generates these tests. Did someone report a bug and then syzkaller wrote this test? Rich. > > > >> The bug here is that some socket familys do not support the > >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown > >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in > >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a > >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug > >> we hang like we see below. > >> > >> I can just remove the flush_workqueue call in that code path since it's > >> not needed there, but it leaves the original bug my patch was hitting > >> where we leave the recv_work running which can then result in leaked > >> resources, or possible use after free crashes and you still get the hang > >> if you remove the module. > >> > >> It looks like we have used kernel_sock_shutdown for a while so I thought > >> we might never have supported sockets that did not support the callout. > >> Is that correct? If so then I can just add a check for this in > >> nbd_add_socket and fix that bug too. > > > > Rich. > > > >> On 09/30/2019 05:39 PM, syzbot wrote: > >>> Hello, > >>> > >>> syzbot found the following crash on: > >>> > >>> HEAD commit: bb2aee77 Add linux-next specific files for 20190926 > >>> git tree: linux-next > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca3600000 > >>> kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 > >>> dashboard link: > >>> https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a > >>> compiler: gcc (GCC) 9.0.0 20181231 (experimental) > >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a3600000 > >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c05600000 > >>> > >>> The bug was bisected to: > >>> > >>> commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > >>> Author: Mike Christie <mchristi@xxxxxxxxxx> > >>> Date: Sun Aug 4 19:10:06 2019 +0000 > >>> > >>> nbd: fix max number of supported devs > >>> > >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c5600000 > >>> final crash: https://syzkaller.appspot.com/x/report.txt?x=1126f3c5600000 > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c5600000 > >>> > >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: > >>> Reported-by: syzbot+24c12fa8d218ed26011a@xxxxxxxxxxxxxxxxxxxxxxxxx > >>> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > >>> > >>> INFO: task syz-executor390:8778 can't die for more than 143 seconds. > >>> syz-executor390 D27432 8778 8777 0x00004004 > >>> Call Trace: > >>> context_switch kernel/sched/core.c:3384 [inline] > >>> __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > >>> schedule+0xd9/0x260 kernel/sched/core.c:4132 > >>> schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > >>> do_wait_for_common kernel/sched/completion.c:83 [inline] > >>> __wait_for_common kernel/sched/completion.c:104 [inline] > >>> wait_for_common kernel/sched/completion.c:115 [inline] > >>> wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > >>> flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > >>> nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > >>> __nbd_ioctl drivers/block/nbd.c:1347 [inline] > >>> nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > >>> __blkdev_driver_ioctl block/ioctl.c:304 [inline] > >>> blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > >>> block_ioctl+0xee/0x130 fs/block_dev.c:1954 > >>> vfs_ioctl fs/ioctl.c:47 [inline] > >>> file_ioctl fs/ioctl.c:539 [inline] > >>> do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > >>> ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > >>> __do_sys_ioctl fs/ioctl.c:750 [inline] > >>> __se_sys_ioctl fs/ioctl.c:748 [inline] > >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > >>> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> RIP: 0033:0x4452d9 > >>> Code: Bad RIP value. > >>> RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > >>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > >>> RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > >>> RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > >>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > >>> R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > >>> INFO: task syz-executor390:8778 blocked for more than 143 seconds. > >>> Not tainted 5.3.0-next-20190926 #0 > >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>> syz-executor390 D27432 8778 8777 0x00004004 > >>> Call Trace: > >>> context_switch kernel/sched/core.c:3384 [inline] > >>> __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > >>> schedule+0xd9/0x260 kernel/sched/core.c:4132 > >>> schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > >>> do_wait_for_common kernel/sched/completion.c:83 [inline] > >>> __wait_for_common kernel/sched/completion.c:104 [inline] > >>> wait_for_common kernel/sched/completion.c:115 [inline] > >>> wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > >>> flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > >>> nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > >>> __nbd_ioctl drivers/block/nbd.c:1347 [inline] > >>> nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > >>> __blkdev_driver_ioctl block/ioctl.c:304 [inline] > >>> blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > >>> block_ioctl+0xee/0x130 fs/block_dev.c:1954 > >>> vfs_ioctl fs/ioctl.c:47 [inline] > >>> file_ioctl fs/ioctl.c:539 [inline] > >>> do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > >>> ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > >>> __do_sys_ioctl fs/ioctl.c:750 [inline] > >>> __se_sys_ioctl fs/ioctl.c:748 [inline] > >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > >>> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> RIP: 0033:0x4452d9 > >>> Code: Bad RIP value. > >>> RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > >>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > >>> RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > >>> RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > >>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > >>> R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > >>> > >>> Showing all locks held in the system: > >>> 1 lock held by khungtaskd/1066: > >>> #0: ffffffff88faad80 (rcu_read_lock){....}, at: > >>> debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5337 > >>> 2 locks held by kworker/u5:0/1525: > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> __write_once_size include/linux/compiler.h:226 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> atomic_long_set include/asm-generic/atomic-long.h:40 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> set_work_data kernel/workqueue.c:620 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> process_one_work+0x88b/0x1740 kernel/workqueue.c:2240 > >>> #1: ffff8880a63b7dc0 ((work_completion)(&args->work)){+.+.}, at: > >>> process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244 > >>> 1 lock held by rsyslogd/8659: > >>> 2 locks held by getty/8749: > >>> #0: ffff888098c08090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f112e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8750: > >>> #0: ffff88808f10b090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8751: > >>> #0: ffff88809a6be090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8752: > >>> #0: ffff8880a48af090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8753: > >>> #0: ffff88808c599090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8754: > >>> #0: ffff88808f1a8090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8755: > >>> #0: ffff88809ab33090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> > >>> ============================================= > >>> > >>> NMI backtrace for cpu 1 > >>> CPU: 1 PID: 1066 Comm: khungtaskd Not tainted 5.3.0-next-20190926 #0 > >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > >>> Google 01/01/2011 > >>> Call Trace: > >>> __dump_stack lib/dump_stack.c:77 [inline] > >>> dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > >>> nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101 > >>> nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62 > >>> arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 > >>> trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] > >>> check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline] > >>> watchdog+0xc99/0x1360 kernel/hung_task.c:353 > >>> kthread+0x361/0x430 kernel/kthread.c:255 > >>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > >>> Sending NMI from CPU 1 to CPUs 0: > >>> NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 > >>> arch/x86/include/asm/irqflags.h:60 > >>> > >>> > >>> --- > >>> This bug is generated by a bot. It may contain errors. > >>> See https://goo.gl/tpsmEJ for more information about syzbot. > >>> syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx. > >>> > >>> syzbot will keep track of this bug report. See: > >>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > >>> For information about bisection process see: > >>> https://goo.gl/tpsmEJ#bisection > >>> syzbot can test patches for this bug, for details see: > >>> https://goo.gl/tpsmEJ#testing-patches > > > > // autogenerated by syzkaller (https://github.com/google/syzkaller) > > #define _GNU_SOURCE > > #include <dirent.h> > #include <endian.h> > #include <errno.h> > #include <fcntl.h> > #include <setjmp.h> > #include <signal.h> > #include <stdarg.h> > #include <stdbool.h> > #include <stdint.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <sys/prctl.h> > #include <sys/stat.h> > #include <sys/syscall.h> > #include <sys/types.h> > #include <sys/wait.h> > #include <time.h> > #include <unistd.h> > > static __thread int skip_segv; > static __thread jmp_buf segv_env; > > static void segv_handler(int sig, siginfo_t* info, void* ctx) > { > uintptr_t addr = (uintptr_t)info->si_addr; > const uintptr_t prog_start = 1 << 20; > const uintptr_t prog_end = 100 << 20; > if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED) && > (addr < prog_start || addr > prog_end)) { > _longjmp(segv_env, 1); > } > exit(sig); > } > > static void install_segv_handler(void) > { > struct sigaction sa; > memset(&sa, 0, sizeof(sa)); > sa.sa_handler = SIG_IGN; > syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8); > syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8); > memset(&sa, 0, sizeof(sa)); > sa.sa_sigaction = segv_handler; > sa.sa_flags = SA_NODEFER | SA_SIGINFO; > sigaction(SIGSEGV, &sa, NULL); > sigaction(SIGBUS, &sa, NULL); > } > > #define NONFAILING(...) \ > { \ > __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST); \ > if (_setjmp(segv_env) == 0) { \ > __VA_ARGS__; \ > } \ > __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST); \ > } > > static void sleep_ms(uint64_t ms) > { > usleep(ms * 1000); > } > > static uint64_t current_time_ms(void) > { > struct timespec ts; > if (clock_gettime(CLOCK_MONOTONIC, &ts)) > exit(1); > return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; > } > > static bool write_file(const char* file, const char* what, ...) > { > char buf[1024]; > va_list args; > va_start(args, what); > vsnprintf(buf, sizeof(buf), what, args); > va_end(args); > buf[sizeof(buf) - 1] = 0; > int len = strlen(buf); > int fd = open(file, O_WRONLY | O_CLOEXEC); > if (fd == -1) > return false; > if (write(fd, buf, len) != len) { > int err = errno; > close(fd); > errno = err; > return false; > } > close(fd); > return true; > } > > static long syz_open_dev(volatile long a0, volatile long a1, volatile long a2) > { > if (a0 == 0xc || a0 == 0xb) { > char buf[128]; > sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block", (uint8_t)a1, > (uint8_t)a2); > return open(buf, O_RDWR, 0); > } else { > char buf[1024]; > char* hash; > NONFAILING(strncpy(buf, (char*)a0, sizeof(buf) - 1)); > buf[sizeof(buf) - 1] = 0; > while ((hash = strchr(buf, '#'))) { > *hash = '0' + (char)(a1 % 10); > a1 /= 10; > } > return open(buf, a2, 0); > } > } > > static void kill_and_wait(int pid, int* status) > { > kill(-pid, SIGKILL); > kill(pid, SIGKILL); > int i; > for (i = 0; i < 100; i++) { > if (waitpid(-1, status, WNOHANG | __WALL) == pid) > return; > usleep(1000); > } > DIR* dir = opendir("/sys/fs/fuse/connections"); > if (dir) { > for (;;) { > struct dirent* ent = readdir(dir); > if (!ent) > break; > if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0) > continue; > char abort[300]; > snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort", > ent->d_name); > int fd = open(abort, O_WRONLY); > if (fd == -1) { > continue; > } > if (write(fd, abort, 1) < 0) { > } > close(fd); > } > closedir(dir); > } else { > } > while (waitpid(-1, status, __WALL) != pid) { > } > } > > static void setup_test() > { > prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0); > setpgrp(); > write_file("/proc/self/oom_score_adj", "1000"); > } > > static void execute_one(void); > > #define WAIT_FLAGS __WALL > > static void loop(void) > { > int iter; > for (iter = 0; iter < 1; iter++) { > int pid = fork(); > if (pid < 0) > exit(1); > if (pid == 0) { > setup_test(); > execute_one(); > exit(0); > } > int status = 0; > uint64_t start = current_time_ms(); > for (;;) { > if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid) > break; > sleep_ms(1); > if (current_time_ms() - start < 5 * 1000) > continue; > kill_and_wait(pid, &status); > break; > } > } > } > > uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff}; > > void execute_one(void) > { > intptr_t res = 0; > res = syscall(__NR_socket, 0x10, 2, 2); > if (res != -1) > r[0] = res; > NONFAILING(memcpy((void*)0x20000080, "/dev/nbd#\000", 10)); > res = syz_open_dev(0x20000080, 0, 0); > if (res != -1) > r[1] = res; > res = syz_open_dev(0, 0, 0); > if (res != -1) > r[2] = res; > syscall(__NR_ioctl, r[2], 0xab00, r[0]); > syscall(__NR_ioctl, r[1], 0xab03, 0); > } > int main(void) > { > syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); > install_segv_handler(); > loop(); > return 0; > } -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW