On 1/9/20 3:40 AM, Stefan Metzmacher wrote: >>> I'm sorry, but I'm still unsure we're talking about the same thing >>> (or maybe I'm missing some basics here). >>> >>> My understanding of the io_uring_enter() is that it will execute as much >>> non-blocking calls as it can without switching to any other kernel thread. >> >> Correct, any SQE that we can do without switching, we will. >> >>> And my fear is that openat will use get_current_cred() instead of >>> ctx->creds. >> >> OK, I think I follow your concern. So you'd like to setup the rings from >> a _different_ user, and then later on use it for submission for SQEs that >> a specific user. So sort of the same as our initial discussion, except >> the mapping would be static. The difference being that you might setup >> the ring from a different user than the user that would be submitting IO >> on it? > > Our current (much simplified here) flow is this: > > # we start as root > seteuid(0);setegid(0);setgroups()... > ... > # we become the user555 and > # create our desired credential token > seteuid(555); seteguid(555); setgroups()... > # Start an openat2 on behalf of user555 > openat2() > # we unbecome the user again and run as root > seteuid(0);setegid(0); setgroups()... > ... > # we become the user444 and > # create our desired credential token > seteuid(444); seteguid(444); setgroups()... > # Start an openat2 on behalf of user444 > openat2() > # we unbecome the user again and run as root > seteuid(0);setegid(0); setgroups()... > ... > # we become the user555 and > # create our desired credential token > seteuid(555); seteguid(555); setgroups()... > # Start an openat2 on behalf of user555 > openat2() > # we unbecome the user again and run as root > seteuid(0);setegid(0); setgroups()... > > It means we have to do about 7 syscalls in order > to open a file on behalf of a user. > (In reality we cache things and avoid set*id() > calls most of the time, but I want to demonstrate the > simplified design here) > > With io_uring I'd like to use a flow like this: > > # we start as root > seteuid(0);setegid(0);setgroups()... > ... > # we become the user444 and > # create our desired credential token > seteuid(444); seteguid(444); setgroups()... > # we snapshot the credentials to the new ring for user444 > ring444 = io_uring_setup() > # we unbecome the user again and run as root > seteuid(0);setegid(0);setgroups()... > ... > # we become the user555 and > # create our desired credential token > seteuid(555); seteguid(555); setgroups()... > # we snapshot the credentials to the new ring for user555 > ring555 = io_uring_setup() > # we unbecome the user again and run as root > seteuid(0);setegid(0);setgroups()... > ... > # Start an openat2 on behalf of user555 > io_uring_enter(ring555, OP_OPENAT2...) > ... > # Start an openat2 on behalf of user444 > io_uring_enter(ring444, OP_OPENAT2...) > ... > # Start an openat2 on behalf of user555 > io_uring_enter(ring555, OP_OPENAT2...) > > So instead of constantly doing 7 syscalls per open, > we would be down to just at most one. And I would assume > that io_uring_enter() would do the temporary credential switch > for me also in the non-blocking case. OK, thanks for spelling the use case out, makes it easier to understand what you need in terms of what we currently can't do. >> If so, then we do need something to support that, probably an >> IORING_REGISTER_CREDS or similar. This would allow you to replace the >> creds you currently have in ctx->creds with whatever new one. > > I don't want to change ctx->creds, but I want it to be used consistently. > > What I think is missing is something like this: > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 32aee149f652..55dbb154915a 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -6359,10 +6359,27 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, > fd, u32, to_submit, > struct mm_struct *cur_mm; > > mutex_lock(&ctx->uring_lock); > + if (current->mm != ctx->sqo_mm) { > + // TODO: somthing like this... > + restore_mm = current->mm; > + use_mm(ctx->sqo_mm); > + } > /* already have mm, so io_submit_sqes() won't try to > grab it */ > cur_mm = ctx->sqo_mm; > + if (current_cred() != ctx->creds) { > + // TODO: somthing like this... > + restore_cred = override_creds(ctx->creds); > + } > submitted = io_submit_sqes(ctx, to_submit, f.file, fd, > &cur_mm, false); > + if (restore_cred != NULL) { > + revert_creds(restore_cred); > + } > + if (restore_mm != NULL) { > + // TODO: something like this... > + unuse_mm(ctx->sqo_mm); > + use_mm(restore_mm); > + } > mutex_unlock(&ctx->uring_lock); > > if (submitted != to_submit) > > I'm not sure if current->mm is needed, I just added it for completeness > and as hint that io_op_defs[req->opcode].needs_mm is there and a > needs_creds could also be added (if it helps with performance) > > Is it possible to trigger a change of current->mm from userspace? > > An IORING_REGISTER_CREDS would only be useful if it's possible to > register a set of credentials and then use per io_uring_sqe credentials. > That would also be fine for me, but I'm not sure it's needed for now. I think it'd be a cleaner way of doing the same thing as your patch does. It seems a little odd to do this by default (having the ring change personalities depending on who's using it), but from an opt-in point of view, I think it makes more sense. That would make the IORING_REGISTER_ call something like IORING_REGISTER_ADOPT_OWNER or something like that, meaning that the ring would just assume the identify of the task that's calling io_uring_enter(). Note that this also has to be passed through to the io-wq handler, as the mappings there are currently static as well. -- Jens Axboe