On 12/4/23 18:05, Jens Axboe wrote:
On 12/4/23 10:53 AM, Keith Busch wrote:
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1d254f2c997de..4aa10b64f539e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3980,6 +3980,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
ctx->syscall_iopoll = 1;
ctx->compat = in_compat_syscall();
+ ctx->sys_admin = capable(CAP_SYS_ADMIN);
if (!ns_capable_noaudit(&init_user_ns, CAP_IPC_LOCK))
ctx->user = get_uid(current_user());
Hmm, what happens if the app starts as eg root for initialization
purposes and drops caps after? That would have previously have caused
passthrough to fail, but now it will work. Perhaps this is fine, after
all this isn't unusual for eg opening device or doing other init special
work?
The side effects would be quite a surprise when you initialize the ring
from a privileged process and then pass it to a less capable one. Ring
sharing would also be affected. Privilege downgrade also sounds like
a valid concern. The first two will be solved if restricted to
IORING_SETUP_DEFER_TASKRUN rings and
io_is_capable() {
return ctx->sys_admin || capable();
}
And it still doesn't seem great bypassing it, when the question is
rather why it's expensive? I've seen before in the wild a fat BPF
program running on every call, is that what happens?
In any case, that should definitely be explicitly mentioned in the
commit message for a change like that.
--
Pavel Begunkov