On 1/28/20 11:04 AM, Jens Axboe wrote: > On 1/28/20 10:19 AM, Jens Axboe wrote: >> On 1/28/20 9:19 AM, Jens Axboe wrote: >>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >>>> Am 28.01.20 um 17:10 schrieb Jens Axboe: >>>>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote: >>>>>> Hi Jens, >>>>>> >>>>>> now that we have IORING_FEAT_CUR_PERSONALITY... >>>>>> >>>>>> How can we optimize the fileserver case now, in order to avoid the >>>>>> overhead of always calling 5 syscalls before io_uring_enter()?: >>>>>> >>>>>> /* gain root again */ >>>>>> setresuid(-1,0,-1); setresgid(-1,0,-1) >>>>>> /* impersonate the user with groups */ >>>>>> setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); >>>>>> /* trigger the operation */ >>>>>> io_uring_enter(); >>>>>> >>>>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be >>>>>> good, together with a IOSQE_FIXED_CREDS in order to specify >>>>>> credentials per operation. >>>>>> >>>>>> Or we make it much more generic and introduce a credsfd_create() >>>>>> syscall in order to get an fd for a credential handle, maybe >>>>>> together with another syscall to activate the credentials of >>>>>> the current thread (or let a write to the fd trigger the activation >>>>>> in order to avoid an additional syscall number). >>>>>> >>>>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] >>>>>> to be just an array of int values instead of a more complex >>>>>> structure to define the credentials. >>>>> >>>>> I'd rather avoid having to add more infrastructure for this, even if >>>>> credsfd_create() would be nifty. >>>>> >>>>> With that in mind, something like: >>>>> >>>>> - Application does IORING_REGISTER_CREDS, which returns some index >>>>> >>>>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated >>>>> with dependent commands >>>>> - Actual request is linked to the IORING_OP_USE_CREDS command, any >>>>> link off IORING_OP_USE_CREDS will use those credentials >>>> >>>> Using links for this sounds ok. >>> >>> Great! I'll try and hack this up and see how it turns out. >>> >>>>> - IORING_UNREGISTER_CREDS removes the registered creds >>>>> >>>>> Just throwing that out there, definitely willing to entertain other >>>>> methods that make sense for this. Trying to avoid needing to put this >>>>> information in the SQE itself, hence the idea to use a chain of links >>>>> for it. >>>>> >>>>> The downside is that we'll need to maintain an array of key -> creds, >>>>> but that's probably not a big deal. >>>>> >>>>> What do you think? >>>> >>>> So IORING_REGISTER_CREDS would be a simple operation that just takes a >>>> snapshot of the current_cred() and returns an id that can be passed to >>>> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS? >>> >>> Right, you would not pass in any arguments, it'd have to be run from the >>> personality you wish to register. It simply returns an integer, which is >>> a key to use for IORING_OP_USE_CREDS, or at the end for >>> IORING_UNREGISTER_CREDS when you no longer wish to use this personality. >>> >>>>> Ideally I'd like to get this done for 5.6 even if we >>>>> are a bit late, so you'll have everything you need with that release. >>>> >>>> That would be great! >>> >>> Crossing fingers... >> >> OK, so here are two patches for testing: >> >> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >> >> #1 adds support for registering the personality of the invoking task, >> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >> just having one link, it doesn't support a chain of them. >> >> I'll try and write a test case for this just to see if it actually works, >> so far it's totally untested. >> >> Adding Pavel to the CC. > > Minor tweak to ensuring we do the right thing for async offload as well, > and it tests fine for me. Test case is: > > - Run as root > - Register personality for root > - create root only file > - check we can IORING_OP_OPENAT the file > - switch to user id test > - check we cannot IORING_OP_OPENAT the file > - check that we can open the file with IORING_OP_USE_CREDS linked I didn't like it becoming a bit too complicated, both in terms of implementation and use. And the fact that we'd have to jump through hoops to make this work for a full chain. So I punted and just added sqe->personality and IOSQE_PERSONALITY. This makes it way easier to use. Same branch: https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds I'd feel much better with this variant for 5.6. -- Jens Axboe