Re: [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()

Vegard Nossum <vegard.nossum@xxxxxxxxxx> · Tue, 6 Dec 2022 14:52:08 +0100

On 12/6/22 10:38, Harshit Mogalapalli wrote:
Syzkaller reports a NULL deref bug as follows:

  BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3

[...]

Add a NULL check on tctx to prevent this.

Fixes: d56d938b4bef ("io_uring: do ctx initiated file note removal")
Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@xxxxxxxxxx>
---
Could not find the root cause of this.

Hi,

I don't think the patch is correct as-is -- we should in any case
probably understand better what's going on.

I think what's happening is something like this, where tsk->io_uring is
set to NULL in begin_new_exec() while we have a pending callback:

fd = io_uring_setup()
[...]

close(fd) ?
- __fput()
  - io_uring_release()
    - io_ring_ctx_wait_and_kill()
      - init_task_work(..., io_tctx_exit_cb) // callback posted

exec()
- begin_new_exec()
  - io_uring_task_cancel()
    - __io_uring_cancel()
      - io_uring_cancel_generic()
        - __io_uring_free()
          - tsk->io_uring = NULL // pointer nulled
- syscall_exit_to_user_mode()
  - [...]
    - task_work_run()
      - io_tctx_exit_cb()
        - *current->io_uring // callback runs: oops

As far as I can tell, whatever is happening in io_ring_exit_work() is
happening too late, as task->io_uring has already been set to NULL.

It looks a bit like this is supposed to be handled in
io_uring_cancel_generic() already where it tries to cancel and wait for
all the outstanding work items to finish, but maybe that is not taking
into account the fact that the exit callback is still pending? Should
io_ring_ctx_wait_and_kill() bump the inflight counter..?

It's unclear to me whether the io_ring_ctx_wait_and_kill() call is
coming through close(), dup2(), or simply exec(), but it looks like this
could potentially get delayed (from the current syscall) and thus pushed
into the exec() call. Maybe flush_delayed_fput() needs to be called
somewhere..?

Anyway, I could be completely off base here as I'm not really familiar
with the code, just wanted to share my notes.

Vegard