On 9/29/22 8:07 AM, Jens Axboe wrote: > On 9/29/22 7:56 AM, Jan Kara wrote: >> On Thu 29-09-22 15:24:22, Vlastimil Babka wrote: >>> On 9/26/22 18:33, syzbot wrote: >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: 105a36f3694e Merge tag 'kbuild-fixes-v6.0-3' of git://git... >>>> git tree: upstream >>>> console+strace: https://syzkaller.appspot.com/x/log.txt?x=152bf540880000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=7db7ad17eb14cb7 >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=dfcc5f4da15868df7d4d >>>> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1020566c880000 >>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=104819e4880000 >>>> >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>>> Reported-by: syzbot+dfcc5f4da15868df7d4d@xxxxxxxxxxxxxxxxxxxxxxxxx >>> >>> +CC more folks >>> >>> I'm not fully sure what this report means but I assume it's because there's >>> a GFP_KERNEL kmalloc() allocation from softirq context? Should it perhaps >>> use memalloc_nofs_save() at some well defined point? >> >> Thanks for the CC. The problem really is that io_uring is calling into >> fsnotify_access() from softirq context. That isn't going to work. The >> allocation is just a tip of the iceberg. Fsnotify simply does not expect to >> be called from softirq context. All the dcache locks are not IRQ safe, it >> can even obtain some sleeping locks and call to userspace if there are >> suitable watches set up. >> >> So either io_uring needs to postpone fsnotify calls to a workqueue or we >> need a way for io_uring code to tell iomap dio code that the completion >> needs to always happen from a workqueue (as it currently does for writes). >> Jens? > > Something like this should probably work - I'll write a test case and > vet it. Ran that with the attached test case, triggers it before but not with the patch. Side note - I do wish that the syzbot reproducers were not x86 specific, I always have to go and edit them for arm64. For this particular one, I just gave up and wrote one myself. Thanks for the heads-up Jan, I'll queue up this fix and mark for stable with the right attributions. #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/fanotify.h> #include <sys/wait.h> #include <liburing.h> int main(int argc, char *argv[]) { struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; struct io_uring ring; int fan, ret, fd; void *buf; fan = fanotify_init(FAN_CLASS_NOTIF|FAN_CLASS_CONTENT, 0); if (fan < 0) { if (errno == ENOSYS) return 0; perror("fanotify_init"); return 1; } if (argc > 1) { fd = open(argv[1], O_RDONLY | O_DIRECT); if (fd < 0) { perror("open"); return 1; } } else { fd = open("file0", O_RDONLY | O_DIRECT); if (fd < 0) { perror("open"); return 1; } } ret = fanotify_mark(fan, FAN_MARK_ADD, FAN_ACCESS|FAN_MODIFY, fd, NULL); if (ret < 0) { perror("fanotify_mark"); return 1; } ret = 0; if (fork()) { int wstat; io_uring_queue_init(4, &ring, 0); if (posix_memalign(&buf, 4096, 4096)) return 0; sqe = io_uring_get_sqe(&ring); io_uring_prep_read(sqe, fd, buf, 4096, 0); io_uring_submit(&ring); ret = io_uring_wait_cqe(&ring, &cqe); if (ret) { fprintf(stderr, "wait_ret=%d\n", ret); return 1; } wait(&wstat); ret = WEXITSTATUS(wstat); } else { struct fanotify_event_metadata m; int fret; fret = read(fan, &m, sizeof(m)); if (fret < 0) perror("fanotify read"); /* fail if mask isn't right or pid indicates non-task context */ else if (!(m.mask & 1) || !m.pid) exit(1); exit(0); } return ret; } -- Jens Axboe