Re: io_uring: risky use of task work, especially wrt fdget()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/28/23 9:19 AM, Jens Axboe wrote:
> On 11/28/23 8:58 AM, Jens Axboe wrote:
>> On 11/27/23 2:53 PM, Jann Horn wrote:
>>> Hi!
>>>
>>> I noticed something that I think does not currently cause any
>>> significant security issues, but could be problematic in the future:
>>>
>>> io_uring sometimes processes task work in the middle of syscalls,
>>> including between fdget() and fdput(). My understanding of task work
>>> is that it is expected to run in a context similar to directly at
>>> syscall entry/exit: task context, no locks held, sleeping is okay, and
>>> it doesn't execute in the middle of some syscall that expects private
>>> state of the task_struct to stay the same.
>>>
>>> An example of another user of task work is the keyring subsystem,
>>> which does task_work_add() in keyctl_session_to_parent() to change the
>>> cred pointers of another task.
>>>
>>> Several places in io_uring process task work while holding an fdget()
>>> reference to some file descriptor. For example, the io_uring_enter
>>> syscall handler calls io_iopoll_check() while the io_ring_ctx is only
>>> referenced via fdget(). This means that if there were another kernel
>>> subsystem that uses task work to close file descriptors, io_uring
>>> would become unsafe. And io_uring does _almost_ that itself, I think:
>>> io_queue_worker_create() can be run on a workqueue, and uses task work
>>> to launch a worker thread from the context of a userspace thread; and
>>> this worker thread can then accept commands to close file descriptors.
>>> Except it doesn't accept commands to close io_uring file descriptors.
>>>
>>> A closer miss might be io_sync_cancel(), which holds a reference to
>>> some normal file with fdget()/fdput() while calling into
>>> io_run_task_work_sig(). However, from what I can tell, the only things
>>> that are actually done with this file pointer are pointer comparisons,
>>> so this also shouldn't have significant security impact.
>>>
>>> Would it make sense to use fget()/fput() instead of fdget()/fdput() in
>>> io_sync_cancel(), io_uring_enter and io_uring_register? These
>>> functions probably usually run in multithreaded environments anyway
>>> (thanks to the io_uring worker threads), so I would think fdget()
>>> shouldn't bring significant performance savings here?
>>
>> Let me run some testing on that. It's a mistake to think that it's
>> usually multithreaded, generally if you end up using io-wq then it's not
>> a fast path. A fast networked setup, for example, would never touch the
>> threads and hence no threading would be implied by using io_uring. Ditto
>> on the storage front, if you're just reading/writing or eg doing polled
>> IO. That said, those workloads are generally threaded _anyway_ - not
>> because of io_uring, but because that's how these kinds of workloads are
>> written to begin with.
>>
>> So probably won't be much of a concern to do the swap. The only
>> "interesting" part of the above mix of cancel/register/enter is
>> obviously the enter part. The rest are not really fast path.
> 
> Did all three and ran the usual testing, which just so happens to be
> multithreaded to begin with anyway. No discernable change from using
> fget/fput over fdget/fdput.
> 
> IOW, we may as well do this. Do you want to send a patch? Or I can send
> out mine, up to you.

For reference, this is what I ran my testing with:

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 3c19cccb1aec..8a8b07dfc444 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -273,7 +273,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
 	};
 	ktime_t timeout = KTIME_MAX;
 	struct io_uring_sync_cancel_reg sc;
-	struct fd f = { };
+	struct file *file = NULL;
 	DEFINE_WAIT(wait);
 	int ret, i;
 
@@ -295,10 +295,10 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
 	/* we can grab a normal file descriptor upfront */
 	if ((cd.flags & IORING_ASYNC_CANCEL_FD) &&
 	   !(cd.flags & IORING_ASYNC_CANCEL_FD_FIXED)) {
-		f = fdget(sc.fd);
-		if (!f.file)
+		file = fget(sc.fd);
+		if (!file)
 			return -EBADF;
-		cd.file = f.file;
+		cd.file = file;
 	}
 
 	ret = __io_sync_cancel(current->io_uring, &cd, sc.fd);
@@ -348,6 +348,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
 	if (ret == -ENOENT || ret > 0)
 		ret = 0;
 out:
-	fdput(f);
+	if (file)
+		fput(file);
 	return ret;
 }
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 05f933dddfde..aba5657d287e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3652,7 +3652,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		size_t, argsz)
 {
 	struct io_ring_ctx *ctx;
-	struct fd f;
+	struct file *file;
 	long ret;
 
 	if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
@@ -3670,20 +3670,19 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
 			return -EINVAL;
 		fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
-		f.file = tctx->registered_rings[fd];
-		f.flags = 0;
-		if (unlikely(!f.file))
+		file = tctx->registered_rings[fd];
+		if (unlikely(!file))
 			return -EBADF;
 	} else {
-		f = fdget(fd);
-		if (unlikely(!f.file))
+		file = fget(fd);
+		if (unlikely(!file))
 			return -EBADF;
 		ret = -EOPNOTSUPP;
-		if (unlikely(!io_is_uring_fops(f.file)))
+		if (unlikely(!io_is_uring_fops(file)))
 			goto out;
 	}
 
-	ctx = f.file->private_data;
+	ctx = file->private_data;
 	ret = -EBADFD;
 	if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED))
 		goto out;
@@ -3777,7 +3776,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		}
 	}
 out:
-	fdput(f);
+	if (!(flags & IORING_ENTER_REGISTERED_RING))
+		fput(file);
 	return ret;
 }
 
@@ -4618,7 +4618,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
 {
 	struct io_ring_ctx *ctx;
 	long ret = -EBADF;
-	struct fd f;
+	struct file *file;
 	bool use_registered_ring;
 
 	use_registered_ring = !!(opcode & IORING_REGISTER_USE_REGISTERED_RING);
@@ -4637,27 +4637,27 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
 		if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
 			return -EINVAL;
 		fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
-		f.file = tctx->registered_rings[fd];
-		f.flags = 0;
-		if (unlikely(!f.file))
+		file = tctx->registered_rings[fd];
+		if (unlikely(!file))
 			return -EBADF;
 	} else {
-		f = fdget(fd);
-		if (unlikely(!f.file))
+		file = fget(fd);
+		if (unlikely(!file))
 			return -EBADF;
 		ret = -EOPNOTSUPP;
-		if (!io_is_uring_fops(f.file))
+		if (!io_is_uring_fops(file))
 			goto out_fput;
 	}
 
-	ctx = f.file->private_data;
+	ctx = file->private_data;
 
 	mutex_lock(&ctx->uring_lock);
 	ret = __io_uring_register(ctx, opcode, arg, nr_args);
 	mutex_unlock(&ctx->uring_lock);
 	trace_io_uring_register(ctx, opcode, ctx->nr_user_files, ctx->nr_user_bufs, ret);
 out_fput:
-	fdput(f);
+	if (!use_registered_ring)
+		fput(file);
 	return ret;
 }
 

-- 
Jens Axboe





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux