On 11/28/23 9:19 AM, Jens Axboe wrote: > On 11/28/23 8:58 AM, Jens Axboe wrote: >> On 11/27/23 2:53 PM, Jann Horn wrote: >>> Hi! >>> >>> I noticed something that I think does not currently cause any >>> significant security issues, but could be problematic in the future: >>> >>> io_uring sometimes processes task work in the middle of syscalls, >>> including between fdget() and fdput(). My understanding of task work >>> is that it is expected to run in a context similar to directly at >>> syscall entry/exit: task context, no locks held, sleeping is okay, and >>> it doesn't execute in the middle of some syscall that expects private >>> state of the task_struct to stay the same. >>> >>> An example of another user of task work is the keyring subsystem, >>> which does task_work_add() in keyctl_session_to_parent() to change the >>> cred pointers of another task. >>> >>> Several places in io_uring process task work while holding an fdget() >>> reference to some file descriptor. For example, the io_uring_enter >>> syscall handler calls io_iopoll_check() while the io_ring_ctx is only >>> referenced via fdget(). This means that if there were another kernel >>> subsystem that uses task work to close file descriptors, io_uring >>> would become unsafe. And io_uring does _almost_ that itself, I think: >>> io_queue_worker_create() can be run on a workqueue, and uses task work >>> to launch a worker thread from the context of a userspace thread; and >>> this worker thread can then accept commands to close file descriptors. >>> Except it doesn't accept commands to close io_uring file descriptors. >>> >>> A closer miss might be io_sync_cancel(), which holds a reference to >>> some normal file with fdget()/fdput() while calling into >>> io_run_task_work_sig(). However, from what I can tell, the only things >>> that are actually done with this file pointer are pointer comparisons, >>> so this also shouldn't have significant security impact. >>> >>> Would it make sense to use fget()/fput() instead of fdget()/fdput() in >>> io_sync_cancel(), io_uring_enter and io_uring_register? These >>> functions probably usually run in multithreaded environments anyway >>> (thanks to the io_uring worker threads), so I would think fdget() >>> shouldn't bring significant performance savings here? >> >> Let me run some testing on that. It's a mistake to think that it's >> usually multithreaded, generally if you end up using io-wq then it's not >> a fast path. A fast networked setup, for example, would never touch the >> threads and hence no threading would be implied by using io_uring. Ditto >> on the storage front, if you're just reading/writing or eg doing polled >> IO. That said, those workloads are generally threaded _anyway_ - not >> because of io_uring, but because that's how these kinds of workloads are >> written to begin with. >> >> So probably won't be much of a concern to do the swap. The only >> "interesting" part of the above mix of cancel/register/enter is >> obviously the enter part. The rest are not really fast path. > > Did all three and ran the usual testing, which just so happens to be > multithreaded to begin with anyway. No discernable change from using > fget/fput over fdget/fdput. > > IOW, we may as well do this. Do you want to send a patch? Or I can send > out mine, up to you. For reference, this is what I ran my testing with: diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 3c19cccb1aec..8a8b07dfc444 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -273,7 +273,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) }; ktime_t timeout = KTIME_MAX; struct io_uring_sync_cancel_reg sc; - struct fd f = { }; + struct file *file = NULL; DEFINE_WAIT(wait); int ret, i; @@ -295,10 +295,10 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) /* we can grab a normal file descriptor upfront */ if ((cd.flags & IORING_ASYNC_CANCEL_FD) && !(cd.flags & IORING_ASYNC_CANCEL_FD_FIXED)) { - f = fdget(sc.fd); - if (!f.file) + file = fget(sc.fd); + if (!file) return -EBADF; - cd.file = f.file; + cd.file = file; } ret = __io_sync_cancel(current->io_uring, &cd, sc.fd); @@ -348,6 +348,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) if (ret == -ENOENT || ret > 0) ret = 0; out: - fdput(f); + if (file) + fput(file); return ret; } diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 05f933dddfde..aba5657d287e 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3652,7 +3652,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, size_t, argsz) { struct io_ring_ctx *ctx; - struct fd f; + struct file *file; long ret; if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | @@ -3670,20 +3670,19 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX)) return -EINVAL; fd = array_index_nospec(fd, IO_RINGFD_REG_MAX); - f.file = tctx->registered_rings[fd]; - f.flags = 0; - if (unlikely(!f.file)) + file = tctx->registered_rings[fd]; + if (unlikely(!file)) return -EBADF; } else { - f = fdget(fd); - if (unlikely(!f.file)) + file = fget(fd); + if (unlikely(!file)) return -EBADF; ret = -EOPNOTSUPP; - if (unlikely(!io_is_uring_fops(f.file))) + if (unlikely(!io_is_uring_fops(file))) goto out; } - ctx = f.file->private_data; + ctx = file->private_data; ret = -EBADFD; if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED)) goto out; @@ -3777,7 +3776,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, } } out: - fdput(f); + if (!(flags & IORING_ENTER_REGISTERED_RING)) + fput(file); return ret; } @@ -4618,7 +4618,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, { struct io_ring_ctx *ctx; long ret = -EBADF; - struct fd f; + struct file *file; bool use_registered_ring; use_registered_ring = !!(opcode & IORING_REGISTER_USE_REGISTERED_RING); @@ -4637,27 +4637,27 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX)) return -EINVAL; fd = array_index_nospec(fd, IO_RINGFD_REG_MAX); - f.file = tctx->registered_rings[fd]; - f.flags = 0; - if (unlikely(!f.file)) + file = tctx->registered_rings[fd]; + if (unlikely(!file)) return -EBADF; } else { - f = fdget(fd); - if (unlikely(!f.file)) + file = fget(fd); + if (unlikely(!file)) return -EBADF; ret = -EOPNOTSUPP; - if (!io_is_uring_fops(f.file)) + if (!io_is_uring_fops(file)) goto out_fput; } - ctx = f.file->private_data; + ctx = file->private_data; mutex_lock(&ctx->uring_lock); ret = __io_uring_register(ctx, opcode, arg, nr_args); mutex_unlock(&ctx->uring_lock); trace_io_uring_register(ctx, opcode, ctx->nr_user_files, ctx->nr_user_bufs, ret); out_fput: - fdput(f); + if (!use_registered_ring) + fput(file); return ret; } -- Jens Axboe