On Tue, Nov 23, 2021 at 11:49:54AM -0800, Eric Biggers wrote: > On Wed, Oct 27, 2021 at 01:18:34AM +0000, Ramji Jiyani wrote: > > Add support for the POLLFREE flag to force complete iocb inline in > > aio_poll_wake(). A thread may use it to signal it's exit and/or request > > to cleanup while pending poll request. In this case, aio_poll_wake() > > needs to make sure it doesn't keep any reference to the queue entry > > before returning from wake to avoid possible use after free via > > poll_cancel() path. > > > > UAF issue was found during binder and aio interactions in certain > > sequence of events [1]. > > > > The POLLFREE flag is no more exclusive to the epoll and is being > > shared with the aio. Remove comment from poll.h to avoid confusion. > > > > [1] https://lore.kernel.org/r/CAKUd0B_TCXRY4h1hTztfwWbNSFQqsudDLn2S_28csgWZmZAG3Q@xxxxxxxxxxxxxx/ > > > > Fixes: af5c72b1fc7a ("Fix aio_poll() races") > > Signed-off-by: Ramji Jiyani <ramjiyani@xxxxxxxxxx> > > Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx> > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx # 4.19+ > > --- > > Looks good, feel free to add: > > Reviewed-by: Eric Biggers <ebiggers@xxxxxxxxxx> > > I'm still not 100% happy with the commit message, but it's good enough. > The actual code looks correct. > > Who is going to take this patch? This is an important fix; it shouldn't be > sitting ignored for months. get_maintainer.pl shows: > > $ ./scripts/get_maintainer.pl fs/aio.c > Benjamin LaHaise <bcrl@xxxxxxxxx> (supporter:AIO) > Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> (maintainer:FILESYSTEMS (VFS and infrastructure)) > linux-aio@xxxxxxxxx (open list:AIO) > linux-fsdevel@xxxxxxxxxxxxxxx (open list:FILESYSTEMS (VFS and infrastructure)) > linux-kernel@xxxxxxxxxxxxxxx (open list) Actually, there is a bug in this patch -- it creates a lock inversion between ctx->ctx_lock (kioctx::ctx_lock) and req->head->lock (wait_queue_head::lock). Task 1: signalfd_cleanup() -> wake_up_poll() [takes wait_queue_head::lock] -> aio_poll_wake() [takes kioctx::ctx_lock] Task 2: sys_io_cancel() [takes kioctx::ctx_lock] -> aio_poll_cancel [takes wait_queue_head::lock] Previously this was okay because the lock operation in aio_poll_wake() was only a trylock. This patch changes it to a regular lock, which causes a deadlock. I am able to reproduce this deadlock. It also generates a lockdep report, shown below. Unfortunately, I don't know how to fix it. Anyone have any ideas? Al and Christoph, it looks like you wrote most of the aio poll code? Note, the use-after-free this patch is fixing also affects signalfd, not just binder, since both rely on POLLFREE. (I was testing it with signalfd.) So we really need to fix it one way or another... ====================================================== WARNING: possible circular locking dependency detected 5.16.0-rc2-00001-gf97efc5c03bf #22 Not tainted ------------------------------------------------------ aio/137 is trying to acquire lock: ffff888006170158 (&ctx->ctx_lock){..-.}-{2:2}, at: aio_poll_wake+0x1ac/0x390 fs/aio.c:1693 but task is already holding lock: ffff8880053a91e0 (&sighand->signalfd_wqh){....}-{2:2}, at: __wake_up_common_lock+0x5b/0xb0 kernel/sched/wait.c:137 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sighand->signalfd_wqh){....}-{2:2}: __lock_acquire+0x4b4/0x960 kernel/locking/lockdep.c:5027 lock_acquire kernel/locking/lockdep.c:5637 [inline] lock_acquire+0xc9/0x2e0 kernel/locking/lockdep.c:5602 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:349 [inline] aio_poll.constprop.0+0x15d/0x440 fs/aio.c:1773 __io_submit_one.constprop.0+0x139/0x1b0 fs/aio.c:1847 io_submit_one+0x134/0x640 fs/aio.c:1884 __do_sys_io_submit fs/aio.c:1943 [inline] __se_sys_io_submit fs/aio.c:1913 [inline] __x64_sys_io_submit+0x89/0x260 fs/aio.c:1913 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&ctx->ctx_lock){..-.}-{2:2}: check_prev_add+0x93/0xbf0 kernel/locking/lockdep.c:3063 check_prevs_add kernel/locking/lockdep.c:3186 [inline] validate_chain+0x585/0x8c0 kernel/locking/lockdep.c:3801 __lock_acquire+0x4b4/0x960 kernel/locking/lockdep.c:5027 lock_acquire kernel/locking/lockdep.c:5637 [inline] lock_acquire+0xc9/0x2e0 kernel/locking/lockdep.c:5602 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3e/0x60 kernel/locking/spinlock.c:162 aio_poll_wake+0x1ac/0x390 fs/aio.c:1693 __wake_up_common+0x8c/0x1a0 kernel/sched/wait.c:108 __wake_up_common_lock+0x77/0xb0 kernel/sched/wait.c:138 __wake_up+0xe/0x10 kernel/sched/wait.c:157 signalfd_cleanup+0x33/0x40 fs/signalfd.c:48 __cleanup_sighand kernel/fork.c:1613 [inline] __cleanup_sighand+0x27/0x50 kernel/fork.c:1610 __exit_signal+0x236/0x380 kernel/exit.c:159 release_task+0x180/0x3d0 kernel/exit.c:200 wait_task_zombie+0x28a/0x600 kernel/exit.c:1114 wait_consider_task+0x121/0x160 kernel/exit.c:1341 do_wait_thread kernel/exit.c:1404 [inline] do_wait+0x21b/0x380 kernel/exit.c:1521 kernel_wait4+0xaa/0x150 kernel/exit.c:1684 __do_sys_wait4+0x85/0x90 kernel/exit.c:1712 __se_sys_wait4 kernel/exit.c:1708 [inline] __x64_sys_wait4+0x17/0x20 kernel/exit.c:1708 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sighand->signalfd_wqh); lock(&ctx->ctx_lock); lock(&sighand->signalfd_wqh); lock(&ctx->ctx_lock); *** DEADLOCK *** 2 locks held by aio/137: #0: ffffffff81e06098 (tasklist_lock){.+.+}-{2:2}, at: release_task+0x110/0x3d0 kernel/exit.c:197 #1: ffff8880053a91e0 (&sighand->signalfd_wqh){....}-{2:2}, at: __wake_up_common_lock+0x5b/0xb0 kernel/sched/wait.c:137 stack backtrace: CPU: 3 PID: 137 Comm: aio Not tainted 5.16.0-rc2-00001-gf97efc5c03bf #22 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 Call Trace: <TASK> show_stack+0x3d/0x3f arch/x86/kernel/dumpstack.c:318 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x49/0x5e lib/dump_stack.c:106 dump_stack+0x10/0x12 lib/dump_stack.c:113 print_circular_bug.cold+0x13e/0x143 kernel/locking/lockdep.c:2021 check_noncircular+0xfe/0x110 kernel/locking/lockdep.c:2143 check_prev_add+0x93/0xbf0 kernel/locking/lockdep.c:3063 check_prevs_add kernel/locking/lockdep.c:3186 [inline] validate_chain+0x585/0x8c0 kernel/locking/lockdep.c:3801 __lock_acquire+0x4b4/0x960 kernel/locking/lockdep.c:5027 lock_acquire kernel/locking/lockdep.c:5637 [inline] lock_acquire+0xc9/0x2e0 kernel/locking/lockdep.c:5602 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3e/0x60 kernel/locking/spinlock.c:162 aio_poll_wake+0x1ac/0x390 fs/aio.c:1693 __wake_up_common+0x8c/0x1a0 kernel/sched/wait.c:108 __wake_up_common_lock+0x77/0xb0 kernel/sched/wait.c:138 __wake_up+0xe/0x10 kernel/sched/wait.c:157 signalfd_cleanup+0x33/0x40 fs/signalfd.c:48 __cleanup_sighand kernel/fork.c:1613 [inline] __cleanup_sighand+0x27/0x50 kernel/fork.c:1610 __exit_signal+0x236/0x380 kernel/exit.c:159 release_task+0x180/0x3d0 kernel/exit.c:200 wait_task_zombie+0x28a/0x600 kernel/exit.c:1114 wait_consider_task+0x121/0x160 kernel/exit.c:1341 do_wait_thread kernel/exit.c:1404 [inline] do_wait+0x21b/0x380 kernel/exit.c:1521 kernel_wait4+0xaa/0x150 kernel/exit.c:1684 __do_sys_wait4+0x85/0x90 kernel/exit.c:1712 __se_sys_wait4 kernel/exit.c:1708 [inline] __x64_sys_wait4+0x17/0x20 kernel/exit.c:1708 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f23a3b0e9ea Code: ff e9 0a 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 49 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 89 54 24 14 RSP: 002b:00007ffcd0926098 EFLAGS: 00000246 ORIG_RAX: 000000000000003d RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f23a3b0e9ea RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffff RBP: 000055944067b000 R08: fffffffe7fffffff R09: fffffffe7fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 00005594406761c0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK>