Re: psi_trigger_poll() is completely broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[changed subject line to hopefully get people to stop ignoring this]

Please see my message below where I explained the problem in detail.  Any
response from the maintainers of kernel/sched/psi.c?  There are a lot of you:

$ ./scripts/get_maintainer.pl kernel/sched/psi.c
Johannes Weiner <hannes@xxxxxxxxxxx> (maintainer:PRESSURE STALL INFORMATION (PSI))
Ingo Molnar <mingo@xxxxxxxxxx> (maintainer:SCHEDULER)
Peter Zijlstra <peterz@xxxxxxxxxxxxx> (maintainer:SCHEDULER)
Juri Lelli <juri.lelli@xxxxxxxxxx> (maintainer:SCHEDULER)
Vincent Guittot <vincent.guittot@xxxxxxxxxx> (maintainer:SCHEDULER)
Dietmar Eggemann <dietmar.eggemann@xxxxxxx> (reviewer:SCHEDULER)
Steven Rostedt <rostedt@xxxxxxxxxxx> (reviewer:SCHEDULER)
Ben Segall <bsegall@xxxxxxxxxx> (reviewer:SCHEDULER)
Mel Gorman <mgorman@xxxxxxx> (reviewer:SCHEDULER)
Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> (reviewer:SCHEDULER)
linux-kernel@xxxxxxxxxxxxxxx (open list:SCHEDULER)

On Fri, Dec 10, 2021 at 07:00:26PM -0800, Eric Biggers wrote:
> On Sat, Dec 11, 2021 at 09:56:20AM +0800, Hillf Danton wrote:
> > On Fri, 10 Dec 2021 14:42:22 -0800
> > > syzbot has found a reproducer for the following issue on:
> > > 
> > > HEAD commit:    e5d75fc20b92 sh_eth: Use dev_err_probe() helper
> > > git tree:       net-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1540cdceb00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=24fd48984584829b
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=cdb5dd11c97cc532efad
> > > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15de00bab00000
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15ad646db00000
> > > 
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+cdb5dd11c97cc532efad@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > 
> > > ==================================================================
> > > BUG: KASAN: use-after-free in __lock_acquire+0x3d86/0x54a0 kernel/locking/lockdep.c:4897
> > > Read of size 8 at addr ffff888015be3740 by task syz-executor161/3598
> > > 
> > > CPU: 1 PID: 3598 Comm: syz-executor161 Not tainted 5.16.0-rc4-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > Call Trace:
> > >  <TASK>
> > >  __dump_stack lib/dump_stack.c:88 [inline]
> > >  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > >  print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
> > >  __kasan_report mm/kasan/report.c:433 [inline]
> > >  kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
> > >  __lock_acquire+0x3d86/0x54a0 kernel/locking/lockdep.c:4897
> > >  lock_acquire kernel/locking/lockdep.c:5637 [inline]
> > >  lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602
> > >  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> > >  _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
> > >  remove_wait_queue+0x1d/0x180 kernel/sched/wait.c:55
> > >  ep_remove_wait_queue+0x88/0x1a0 fs/eventpoll.c:545
> > >  ep_unregister_pollwait fs/eventpoll.c:561 [inline]
> > >  ep_remove+0x106/0x9c0 fs/eventpoll.c:690
> > >  eventpoll_release_file+0xe1/0x130 fs/eventpoll.c:923
> > >  eventpoll_release include/linux/eventpoll.h:53 [inline]
> > >  __fput+0x87b/0x9f0 fs/file_table.c:271
> > >  task_work_run+0xdd/0x1a0 kernel/task_work.c:164
> > >  tracehook_notify_resume include/linux/tracehook.h:189 [inline]
> > >  exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
> > >  exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
> > >  __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
> > >  syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
> > >  do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > RIP: 0033:0x7f3167c0def3
> > > Code: c7 c2 c0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
> > > RSP: 002b:00007ffddef2e488 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> > > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007f3167c0def3
> > > RDX: 000000000000002f RSI: 0000000020001340 RDI: 0000000000000004
> > > RBP: 0000000000000000 R08: 0000000000000014 R09: 00007ffddef2e4b0
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffddef2e4ac
> > > R13: 00007ffddef2e4c0 R14: 00007ffddef2e500 R15: 0000000000000000
> > >  </TASK>
> > > 
> > > Allocated by task 3598:
> > >  kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
> > >  kasan_set_track mm/kasan/common.c:46 [inline]
> > >  set_alloc_info mm/kasan/common.c:434 [inline]
> > >  ____kasan_kmalloc mm/kasan/common.c:513 [inline]
> > >  ____kasan_kmalloc mm/kasan/common.c:472 [inline]
> > >  __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:522
> > >  kmalloc include/linux/slab.h:590 [inline]
> > >  psi_trigger_create.part.0+0x15e/0x7f0 kernel/sched/psi.c:1141
> > >  cgroup_pressure_write+0x15d/0x6b0 kernel/cgroup/cgroup.c:3645
> > >  cgroup_file_write+0x1ec/0x780 kernel/cgroup/cgroup.c:3852
> > >  kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
> > >  call_write_iter include/linux/fs.h:2162 [inline]
> > >  new_sync_write+0x429/0x660 fs/read_write.c:503
> > >  vfs_write+0x7cd/0xae0 fs/read_write.c:590
> > >  ksys_write+0x12d/0x250 fs/read_write.c:643
> > >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > 
> > > Freed by task 3598:
> > >  kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
> > >  kasan_set_track+0x21/0x30 mm/kasan/common.c:46
> > >  kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
> > >  ____kasan_slab_free mm/kasan/common.c:366 [inline]
> > >  ____kasan_slab_free mm/kasan/common.c:328 [inline]
> > >  __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
> > >  kasan_slab_free include/linux/kasan.h:235 [inline]
> > >  slab_free_hook mm/slub.c:1723 [inline]
> > >  slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
> > >  slab_free mm/slub.c:3513 [inline]
> > >  kfree+0xf6/0x560 mm/slub.c:4561
> > >  cgroup_pressure_write+0x18d/0x6b0 kernel/cgroup/cgroup.c:3651
> > >  cgroup_file_write+0x1ec/0x780 kernel/cgroup/cgroup.c:3852
> > >  kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
> > >  call_write_iter include/linux/fs.h:2162 [inline]
> > >  new_sync_write+0x429/0x660 fs/read_write.c:503
> > >  vfs_write+0x7cd/0xae0 fs/read_write.c:590
> > >  ksys_write+0x12d/0x250 fs/read_write.c:643
> > >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > 
> > > The buggy address belongs to the object at ffff888015be3700
> > >  which belongs to the cache kmalloc-192 of size 192
> > > The buggy address is located 64 bytes inside of
> > >  192-byte region [ffff888015be3700, ffff888015be37c0)
> > > The buggy address belongs to the page:
> > > page:ffffea000056f8c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15be3
> > > flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
> > > raw: 00fff00000000200 0000000000000000 dead000000000001 ffff888010c41a00
> > > raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
> > > page dumped because: kasan: bad access detected
> > > page_owner tracks the page as allocated
> > > page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, ts 1983850449, free_ts 0
> > >  prep_new_page mm/page_alloc.c:2418 [inline]
> > >  get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
> > >  __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
> > >  alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2036
> > >  alloc_pages+0x29f/0x300 mm/mempolicy.c:2186
> > >  alloc_slab_page mm/slub.c:1793 [inline]
> > >  allocate_slab mm/slub.c:1930 [inline]
> > >  new_slab+0x32d/0x4a0 mm/slub.c:1993
> > >  ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
> > >  __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
> > >  slab_alloc_node mm/slub.c:3200 [inline]
> > >  slab_alloc mm/slub.c:3242 [inline]
> > >  kmem_cache_alloc_trace+0x289/0x2c0 mm/slub.c:3259
> > >  kmalloc include/linux/slab.h:590 [inline]
> > >  kzalloc include/linux/slab.h:724 [inline]
> > >  call_usermodehelper_setup+0x97/0x340 kernel/umh.c:365
> > >  kobject_uevent_env+0xf73/0x1650 lib/kobject_uevent.c:614
> > >  version_sysfs_builtin kernel/params.c:878 [inline]
> > >  param_sysfs_init+0x146/0x43b kernel/params.c:969
> > >  do_one_initcall+0x103/0x650 init/main.c:1297
> > >  do_initcall_level init/main.c:1370 [inline]
> > >  do_initcalls init/main.c:1386 [inline]
> > >  do_basic_setup init/main.c:1405 [inline]
> > >  kernel_init_freeable+0x6b1/0x73a init/main.c:1610
> > >  kernel_init+0x1a/0x1d0 init/main.c:1499
> > >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > > page_owner free stack trace missing
> > > 
> > > Memory state around the buggy address:
> > >  ffff888015be3600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >  ffff888015be3680: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
> > > >ffff888015be3700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >                                            ^
> > >  ffff888015be3780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> > >  ffff888015be3800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > ==================================================================
> > 
> > Hey Eric
> > 
> > Let us know if this uaf adds another call site for what you added [1].
> > 
> > Hillf
> > 
> > [1] https://lore.kernel.org/lkml/20211209010455.42744-2-ebiggers@xxxxxxxxxx/
> > 
> > +++ x/kernel/sched/psi.c
> > @@ -1193,7 +1193,7 @@ static void psi_trigger_destroy(struct k
> >  	 * Wakeup waiters to stop polling. Can happen if cgroup is deleted
> >  	 * from under a polling process.
> >  	 */
> > -	wake_up_interruptible(&t->event_wait);
> > +	wake_up_pollfree(&t->event_wait);
> >  
> >  	mutex_lock(&group->trigger_lock);
> 
> [added linux-mm and all maintainers for kernel/sched/psi.c]
> 
> Well, it is the same sort of issue, but POLLFREE is *not* enough here.  POLLFREE
> only works if the lifetime of waitqueue is tied to the polling task, as blocking
> polls don't handle it -- only non-blocking polls do.
> 
> The kernel/sched/psi.c use case is just totally broken, since the lifetime of
> its waitqueue is totally arbitrary; the open file descriptor can be written to
> at any time by any process, which causes the waitqueue to be freed.  So it will
> cause a use-after-free even for regular blocking poll().
> 
> To fix this, I think the psi trigger stuff will need to be refactored to have
> just one waitqueue per open file.  We need to be removing uses of POLLFREE, not
> adding new ones.  (See Linus' comments on POLLFREE here:
> https://lore.kernel.org/lkml/CAHk-=wgvt7PH+AU_29H95tJQZ9FnhS8vVmymbhpZ6NZ7yaAigw@xxxxxxxxxxxxxx/)
> 
> Here are some repros:
> 
> #include <fcntl.h>
> #include <sys/epoll.h>
> #include <unistd.h>
> int main()
> {
>         int fd = open("/proc/pressure/cpu", O_RDWR);
>         int epfd = epoll_create(1);
>         const char trigger[] = "some 100000 1000000";
>         struct epoll_event event = { .events = EPOLLIN };
> 
>         write(fd, trigger, sizeof(trigger));
>         epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &event);
>         write(fd, trigger, sizeof(trigger));
> }
> 
> 
> #include <fcntl.h>
> #include <sys/poll.h>
> #include <unistd.h>
> int main()
> {
>         int fd = open("/proc/pressure/cpu", O_RDWR);
>         const char trigger[] = "some 100000 1000000";
> 
>         if (fork()) {
>                 struct pollfd pfd = { .fd = fd, .events = POLLIN };
> 
>                 for (;;)
>                         poll(&pfd, 1, -1);
>         } else {
>                 for (;;)
>                         write(fd, trigger, sizeof(trigger));
>         }
> }



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux