Re: [PATCHSET 0/4] Add support for shared io-wq backends

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 28.01.20 um 00:38 schrieb Jens Axboe:
> On 1/27/20 4:25 PM, Pavel Begunkov wrote:
>> On 28/01/2020 02:23, Jens Axboe wrote:
>>> On 1/27/20 4:17 PM, Pavel Begunkov wrote:
>>>> On 28/01/2020 02:00, Jens Axboe wrote:
>>>>> On 1/27/20 3:40 PM, Jens Axboe wrote:
>>>>>> On 1/27/20 2:45 PM, Pavel Begunkov wrote:
>>>>>>> On 27/01/2020 23:33, Jens Axboe wrote:
>>>>>>>> On 1/27/20 7:07 AM, Pavel Begunkov wrote:
>>>>>>>>> On 1/27/2020 4:39 PM, Jens Axboe wrote:
>>>>>>>>>> On 1/27/20 6:29 AM, Pavel Begunkov wrote:
>>>>>>>>>>> On 1/26/2020 8:00 PM, Jens Axboe wrote:
>>>>>>>>>>>> On 1/26/20 8:11 AM, Pavel Begunkov wrote:
>>>>>>>>>>>>> On 1/26/2020 4:51 AM, Daurnimator wrote:
>>>>>>>>>>>>>> On Fri, 24 Jan 2020 at 10:16, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>>>>>>>> Ok. I can't promise it'll play handy for sharing. Though, you'll be out
>>>>>>>>>>> of space in struct io_uring_params soon anyway.
>>>>>>>>>>
>>>>>>>>>> I'm going to keep what we have for now, as I'm really not imagining a
>>>>>>>>>> lot more sharing - what else would we share? So let's not over-design
>>>>>>>>>> anything.
>>>>>>>>>>
>>>>>>>>> Fair enough. I prefer a ptr to an extendable struct, that will take the
>>>>>>>>> last u64, when needed.
>>>>>>>>>
>>>>>>>>> However, it's still better to share through file descriptors. It's just
>>>>>>>>> not secure enough the way it's now.
>>>>>>>>
>>>>>>>> Is the file descriptor value really a good choice? We just had some
>>>>>>>> confusion on ring sharing across forks. Not sure using an fd value
>>>>>>>> is a sane "key" to use across processes.
>>>>>>>>
>>>>>>> As I see it, the problem with @mm is that uring is dead-bound to it.
>>>>>>> For example, a process can create and send uring (e.g. via socket),
>>>>>>> and then be killed. And that basically means
>>>>>>> 1. @mm of the process is locked just because of the sent uring
>>>>>>> instance.
>>>>>>> 2. a process may have an io_uring, which bound to @mm of another
>>>>>>> process, even though the layouts may be completely different.
>>>>>>>
>>>>>>> File descriptors are different here, because io_uring doesn't know
>>>>>>> about them, They are controlled by the userspace (send, dup, fork,
>>>>>>> etc), and don't sabotage all isolation work done in the kernel. A dire
>>>>>>> example here is stealing io-wq from within a container, which is
>>>>>>> trivial with global self-made id. I would love to hear, if I am
>>>>>>> mistaken somewhere.
>>>>>>>
>>>>>>> Is there some better option?
>>>>>>
>>>>>> OK, so how about this:
>>>>>>
>>>>>> - We use the 'fd' as the lookup key. This makes it easy since we can
>>>>>>   just check if it's a io_uring instance or not, we don't need to do any
>>>>>>   tracking on the side. It also means that the application asking for
>>>>>>   sharing must already have some relationship to the process that
>>>>>>   created the ring.
>>>>
>>>> Yeah, that's exactly the point.
>>>>
>>>>>>
>>>>>> - mm/creds must be transferred through the work item. Any SQE done on
>>>>>>   behalf of io_uring_enter() directly already has that, if punted we
>>>>>>   must pass the creds and mm. This means we break the static setup of
>>>>>>   io_wq->mm/creds. It also means that we probably have to add that to
>>>>>>   io_wq_work, which kind of sucks, but...
>>>>
>>>> ehh, juggling mm's... But don't have anything nicer myself.
>>>
>>> We already do juggle mm's, this is no different. A worker potentially
>>> retain the mm across works if they are the same.
>>>
>>>>> It'd fix Stefan's worry too.
>>>>>
>>>>>> I think with that we have a decent setup, that's also safe. I've dropped
>>>>>> the sharing patches for now, from the 5.6 tree.
>>>>>
>>>>> So one concern might be SQPOLL, it'll have to use the ctx creds and mm
>>>>> as usual. I guess that is ok.
>>>>>
>>>>
>>>> OK. I'll send the patches for the first part now, and take a look at
>>>> the second one a bit latter if isn't done until then.
>>>
>>> Hang on a second, I'm doing the mm and creds bits right now. I'll push
>>> that to a branch, if you want to do the actual fd stuff on top of that,
>>> that would be great.
>>>
>> Sure, should be trivially mergeable.
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-wq
> 
> Top patch there is the mm/creds passing. I kind of like it even if it
> means we're growing io_wq_worker (and subsequently io_kiocb) by 16
> bytes, as it means we can be more flexible. This solves it for this use
> case, but also the case that Stefan was worried about.

Ok, that means that ctx->creds is only used in the IORING_SETUP_SQPOLL
case and there it's used for all requests as get_current_cred() is the
same as ctx->creds from within io_sq_thread(), correct?

And in all other cases get_current_cred() is used at io_uring_enter() time.

That's good in order to make the behavior consistent again and prevents
potential random security problems.

BTW: you need to revert/drop 44d282796f81eb1debc1d7cb53245b4cb3214cb5
in that branch. Or just rebase on v5.5 final?

Thanks!
metze



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux