[changed subject line to hopefully get people to stop ignoring this] Please see my message below where I explained the problem in detail. Any response from the maintainers of kernel/sched/psi.c? There are a lot of you: $ ./scripts/get_maintainer.pl kernel/sched/psi.c Johannes Weiner <hannes@xxxxxxxxxxx> (maintainer:PRESSURE STALL INFORMATION (PSI)) Ingo Molnar <mingo@xxxxxxxxxx> (maintainer:SCHEDULER) Peter Zijlstra <peterz@xxxxxxxxxxxxx> (maintainer:SCHEDULER) Juri Lelli <juri.lelli@xxxxxxxxxx> (maintainer:SCHEDULER) Vincent Guittot <vincent.guittot@xxxxxxxxxx> (maintainer:SCHEDULER) Dietmar Eggemann <dietmar.eggemann@xxxxxxx> (reviewer:SCHEDULER) Steven Rostedt <rostedt@xxxxxxxxxxx> (reviewer:SCHEDULER) Ben Segall <bsegall@xxxxxxxxxx> (reviewer:SCHEDULER) Mel Gorman <mgorman@xxxxxxx> (reviewer:SCHEDULER) Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> (reviewer:SCHEDULER) linux-kernel@xxxxxxxxxxxxxxx (open list:SCHEDULER) On Fri, Dec 10, 2021 at 07:00:26PM -0800, Eric Biggers wrote: > On Sat, Dec 11, 2021 at 09:56:20AM +0800, Hillf Danton wrote: > > On Fri, 10 Dec 2021 14:42:22 -0800 > > > syzbot has found a reproducer for the following issue on: > > > > > > HEAD commit: e5d75fc20b92 sh_eth: Use dev_err_probe() helper > > > git tree: net-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1540cdceb00000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=24fd48984584829b > > > dashboard link: https://syzkaller.appspot.com/bug?extid=cdb5dd11c97cc532efad > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15de00bab00000 > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15ad646db00000 > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+cdb5dd11c97cc532efad@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > ================================================================== > > > BUG: KASAN: use-after-free in __lock_acquire+0x3d86/0x54a0 kernel/locking/lockdep.c:4897 > > > Read of size 8 at addr ffff888015be3740 by task syz-executor161/3598 > > > > > > CPU: 1 PID: 3598 Comm: syz-executor161 Not tainted 5.16.0-rc4-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > Call Trace: > > > <TASK> > > > __dump_stack lib/dump_stack.c:88 [inline] > > > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 > > > print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247 > > > __kasan_report mm/kasan/report.c:433 [inline] > > > kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 > > > __lock_acquire+0x3d86/0x54a0 kernel/locking/lockdep.c:4897 > > > lock_acquire kernel/locking/lockdep.c:5637 [inline] > > > lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602 > > > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > > > _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162 > > > remove_wait_queue+0x1d/0x180 kernel/sched/wait.c:55 > > > ep_remove_wait_queue+0x88/0x1a0 fs/eventpoll.c:545 > > > ep_unregister_pollwait fs/eventpoll.c:561 [inline] > > > ep_remove+0x106/0x9c0 fs/eventpoll.c:690 > > > eventpoll_release_file+0xe1/0x130 fs/eventpoll.c:923 > > > eventpoll_release include/linux/eventpoll.h:53 [inline] > > > __fput+0x87b/0x9f0 fs/file_table.c:271 > > > task_work_run+0xdd/0x1a0 kernel/task_work.c:164 > > > tracehook_notify_resume include/linux/tracehook.h:189 [inline] > > > exit_to_user_mode_loop kernel/entry/common.c:175 [inline] > > > exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 > > > __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] > > > syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 > > > do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 > > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > RIP: 0033:0x7f3167c0def3 > > > Code: c7 c2 c0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 > > > RSP: 002b:00007ffddef2e488 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 > > > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007f3167c0def3 > > > RDX: 000000000000002f RSI: 0000000020001340 RDI: 0000000000000004 > > > RBP: 0000000000000000 R08: 0000000000000014 R09: 00007ffddef2e4b0 > > > R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffddef2e4ac > > > R13: 00007ffddef2e4c0 R14: 00007ffddef2e500 R15: 0000000000000000 > > > </TASK> > > > > > > Allocated by task 3598: > > > kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 > > > kasan_set_track mm/kasan/common.c:46 [inline] > > > set_alloc_info mm/kasan/common.c:434 [inline] > > > ____kasan_kmalloc mm/kasan/common.c:513 [inline] > > > ____kasan_kmalloc mm/kasan/common.c:472 [inline] > > > __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:522 > > > kmalloc include/linux/slab.h:590 [inline] > > > psi_trigger_create.part.0+0x15e/0x7f0 kernel/sched/psi.c:1141 > > > cgroup_pressure_write+0x15d/0x6b0 kernel/cgroup/cgroup.c:3645 > > > cgroup_file_write+0x1ec/0x780 kernel/cgroup/cgroup.c:3852 > > > kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296 > > > call_write_iter include/linux/fs.h:2162 [inline] > > > new_sync_write+0x429/0x660 fs/read_write.c:503 > > > vfs_write+0x7cd/0xae0 fs/read_write.c:590 > > > ksys_write+0x12d/0x250 fs/read_write.c:643 > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > > > Freed by task 3598: > > > kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 > > > kasan_set_track+0x21/0x30 mm/kasan/common.c:46 > > > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 > > > ____kasan_slab_free mm/kasan/common.c:366 [inline] > > > ____kasan_slab_free mm/kasan/common.c:328 [inline] > > > __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374 > > > kasan_slab_free include/linux/kasan.h:235 [inline] > > > slab_free_hook mm/slub.c:1723 [inline] > > > slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749 > > > slab_free mm/slub.c:3513 [inline] > > > kfree+0xf6/0x560 mm/slub.c:4561 > > > cgroup_pressure_write+0x18d/0x6b0 kernel/cgroup/cgroup.c:3651 > > > cgroup_file_write+0x1ec/0x780 kernel/cgroup/cgroup.c:3852 > > > kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296 > > > call_write_iter include/linux/fs.h:2162 [inline] > > > new_sync_write+0x429/0x660 fs/read_write.c:503 > > > vfs_write+0x7cd/0xae0 fs/read_write.c:590 > > > ksys_write+0x12d/0x250 fs/read_write.c:643 > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > > > The buggy address belongs to the object at ffff888015be3700 > > > which belongs to the cache kmalloc-192 of size 192 > > > The buggy address is located 64 bytes inside of > > > 192-byte region [ffff888015be3700, ffff888015be37c0) > > > The buggy address belongs to the page: > > > page:ffffea000056f8c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15be3 > > > flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) > > > raw: 00fff00000000200 0000000000000000 dead000000000001 ffff888010c41a00 > > > raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 > > > page dumped because: kasan: bad access detected > > > page_owner tracks the page as allocated > > > page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, ts 1983850449, free_ts 0 > > > prep_new_page mm/page_alloc.c:2418 [inline] > > > get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149 > > > __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369 > > > alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2036 > > > alloc_pages+0x29f/0x300 mm/mempolicy.c:2186 > > > alloc_slab_page mm/slub.c:1793 [inline] > > > allocate_slab mm/slub.c:1930 [inline] > > > new_slab+0x32d/0x4a0 mm/slub.c:1993 > > > ___slab_alloc+0x918/0xfe0 mm/slub.c:3022 > > > __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109 > > > slab_alloc_node mm/slub.c:3200 [inline] > > > slab_alloc mm/slub.c:3242 [inline] > > > kmem_cache_alloc_trace+0x289/0x2c0 mm/slub.c:3259 > > > kmalloc include/linux/slab.h:590 [inline] > > > kzalloc include/linux/slab.h:724 [inline] > > > call_usermodehelper_setup+0x97/0x340 kernel/umh.c:365 > > > kobject_uevent_env+0xf73/0x1650 lib/kobject_uevent.c:614 > > > version_sysfs_builtin kernel/params.c:878 [inline] > > > param_sysfs_init+0x146/0x43b kernel/params.c:969 > > > do_one_initcall+0x103/0x650 init/main.c:1297 > > > do_initcall_level init/main.c:1370 [inline] > > > do_initcalls init/main.c:1386 [inline] > > > do_basic_setup init/main.c:1405 [inline] > > > kernel_init_freeable+0x6b1/0x73a init/main.c:1610 > > > kernel_init+0x1a/0x1d0 init/main.c:1499 > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 > > > page_owner free stack trace missing > > > > > > Memory state around the buggy address: > > > ffff888015be3600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ffff888015be3680: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc > > > >ffff888015be3700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > > ^ > > > ffff888015be3780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc > > > ffff888015be3800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ================================================================== > > > > Hey Eric > > > > Let us know if this uaf adds another call site for what you added [1]. > > > > Hillf > > > > [1] https://lore.kernel.org/lkml/20211209010455.42744-2-ebiggers@xxxxxxxxxx/ > > > > +++ x/kernel/sched/psi.c > > @@ -1193,7 +1193,7 @@ static void psi_trigger_destroy(struct k > > * Wakeup waiters to stop polling. Can happen if cgroup is deleted > > * from under a polling process. > > */ > > - wake_up_interruptible(&t->event_wait); > > + wake_up_pollfree(&t->event_wait); > > > > mutex_lock(&group->trigger_lock); > > [added linux-mm and all maintainers for kernel/sched/psi.c] > > Well, it is the same sort of issue, but POLLFREE is *not* enough here. POLLFREE > only works if the lifetime of waitqueue is tied to the polling task, as blocking > polls don't handle it -- only non-blocking polls do. > > The kernel/sched/psi.c use case is just totally broken, since the lifetime of > its waitqueue is totally arbitrary; the open file descriptor can be written to > at any time by any process, which causes the waitqueue to be freed. So it will > cause a use-after-free even for regular blocking poll(). > > To fix this, I think the psi trigger stuff will need to be refactored to have > just one waitqueue per open file. We need to be removing uses of POLLFREE, not > adding new ones. (See Linus' comments on POLLFREE here: > https://lore.kernel.org/lkml/CAHk-=wgvt7PH+AU_29H95tJQZ9FnhS8vVmymbhpZ6NZ7yaAigw@xxxxxxxxxxxxxx/) > > Here are some repros: > > #include <fcntl.h> > #include <sys/epoll.h> > #include <unistd.h> > int main() > { > int fd = open("/proc/pressure/cpu", O_RDWR); > int epfd = epoll_create(1); > const char trigger[] = "some 100000 1000000"; > struct epoll_event event = { .events = EPOLLIN }; > > write(fd, trigger, sizeof(trigger)); > epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &event); > write(fd, trigger, sizeof(trigger)); > } > > > #include <fcntl.h> > #include <sys/poll.h> > #include <unistd.h> > int main() > { > int fd = open("/proc/pressure/cpu", O_RDWR); > const char trigger[] = "some 100000 1000000"; > > if (fork()) { > struct pollfd pfd = { .fd = fd, .events = POLLIN }; > > for (;;) > poll(&pfd, 1, -1); > } else { > for (;;) > write(fd, trigger, sizeof(trigger)); > } > }