Re: [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()

Harshit Mogalapalli <harshit.m.mogalapalli@xxxxxxxxxx> · Wed, 7 Dec 2022 08:10:05 +0530

On 07/12/22 2:45 am, Jens Axboe wrote:
On 12/6/22 2:38?AM, Harshit Mogalapalli wrote:
Syzkaller reports a NULL deref bug as follows:

  BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
  Read of size 4 at addr 0000000000000138 by task file1/1955

  CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0xcd/0x134
   ? io_tctx_exit_cb+0x53/0xd3
   kasan_report+0xbb/0x1f0
   ? io_tctx_exit_cb+0x53/0xd3
   kasan_check_range+0x140/0x190
   io_tctx_exit_cb+0x53/0xd3
   task_work_run+0x164/0x250
   ? task_work_cancel+0x30/0x30
   get_signal+0x1c3/0x2440
   ? lock_downgrade+0x6e0/0x6e0
   ? lock_downgrade+0x6e0/0x6e0
   ? exit_signals+0x8b0/0x8b0
   ? do_raw_read_unlock+0x3b/0x70
   ? do_raw_spin_unlock+0x50/0x230
   arch_do_signal_or_restart+0x82/0x2470
   ? kmem_cache_free+0x260/0x4b0
   ? putname+0xfe/0x140
   ? get_sigframe_size+0x10/0x10
   ? do_execveat_common.isra.0+0x226/0x710
   ? lockdep_hardirqs_on+0x79/0x100
   ? putname+0xfe/0x140
   ? do_execveat_common.isra.0+0x238/0x710
   exit_to_user_mode_prepare+0x15f/0x250
   syscall_exit_to_user_mode+0x19/0x50
   do_syscall_64+0x42/0xb0
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
  RIP: 0023:0x0
  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
  RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>
  Kernel panic - not syncing: panic_on_warn set ...

Add a NULL check on tctx to prevent this.

I agree with Vegard that I don't think this is fixing the core of
the issue. I think what is happening here is that we don't run the
task_work in io_uring_cancel_generic() unconditionally, if we don't
need to in the loop above. But we do need to ensure we run it before
we clear current->io_uring.

Do you have a reproducer? If so, can you try the below? I _think_
this is all we need. We can't be hitting the delayed fput path as
the task isn't exiting, and we're dealing with current here.


Thanks Jens and Vegard for the suggestions and analysis.

Yes, the below patch silences the reproducer.

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 36cb63e4174f..4791d94c88f5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3125,6 +3125,15 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
  
  	io_uring_clean_tctx(tctx);
  	if (cancel_all) {
+		/*
+		 * If we didn't run task_work in the loop above, ensure we
+		 * do so here. If an fput() queued up exit task_work for the
+		 * ring descriptor before we started the exec that led to this
+		 * cancelation, then we need to have that run before we proceed
+		 * with tearing down current->io_uring.
+		 */
+		io_run_task_work();
+
  		/*
  		 * We shouldn't run task_works after cancel, so just leave
  		 * ->in_idle set for normal exit.


Thanks,
Harshit