On 21/02/2020 22:10, Jens Axboe wrote: > On 2/21/20 11:30 AM, Pavel Begunkov wrote: >> On 21/02/2020 17:50, Jens Axboe wrote: >>> On 2/21/20 6:51 AM, Pavel Begunkov wrote: >>>> On 20/02/2020 23:31, Jens Axboe wrote: >>>>> For poll requests, it's not uncommon to link a read (or write) after >>>>> the poll to execute immediately after the file is marked as ready. >>>>> Since the poll completion is called inside the waitqueue wake up handler, >>>>> we have to punt that linked request to async context. This slows down >>>>> the processing, and actually means it's faster to not use a link for this >>>>> use case. >>>>> >>>>> We also run into problems if the completion_lock is contended, as we're >>>>> doing a different lock ordering than the issue side is. Hence we have >>>>> to do trylock for completion, and if that fails, go async. Poll removal >>>>> needs to go async as well, for the same reason. >>>>> >>>>> eventfd notification needs special case as well, to avoid stack blowing >>>>> recursion or deadlocks. >>>>> >>>>> These are all deficiencies that were inherited from the aio poll >>>>> implementation, but I think we can do better. When a poll completes, >>>>> simply queue it up in the task poll list. When the task completes the >>>>> list, we can run dependent links inline as well. This means we never >>>>> have to go async, and we can remove a bunch of code associated with >>>>> that, and optimizations to try and make that run faster. The diffstat >>>>> speaks for itself. >>>> >>>> So, it piggybacks request execution onto a random task, that happens >>>> to complete a poll. Did I get it right? >>>> >>>> I can't find where it setting right mm, creds, etc., or why it have >>>> them already. >>> >>> Not a random task, the very task that initially tried to do the receive >>> (or whatever the operation may be). Hence there's no need to set >>> mm/creds/whatever, we're still running in the context of the original >>> task once we retry the operation after the poll signals readiness. >> >> Got it. Then, it may happen in the future after returning from >> __io_arm_poll_handler() and io_uring_enter(). And by that time io_submit_sqes() >> should have already restored creds (i.e. personality stuff) on the way back. >> This might be a problem. > > Not sure I follow, can you elaborate? Just to be sure, the requests that > go through the poll handler will go through __io_queue_sqe() again. Oh I > guess your point is that that is one level below where we normally > assign the creds. Yeah, exactly. Poll handler won't do the personality dancing, as it doesn't go through io_submit_sqes(). > >> BTW, Is it by design, that all requests of a link use personality creds >> specified in the head's sqe? > > No, I think that's more by accident. We should make sure they use the > specified creds, regardless of the issue time. Care to clean that up? > Would probably help get it right for the poll case, too. Ok, I'll prepare -- Pavel Begunkov