On 8/20/21 4:28 PM, Pavel Begunkov wrote: > On 8/20/21 11:09 PM, Jens Axboe wrote: >> On 8/20/21 3:32 PM, Pavel Begunkov wrote: >>> On 8/20/21 9:39 PM, Hao Xu wrote: >>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道: >>>>> On 8/20/21 7:40 PM, Hao Xu wrote: >>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this >>>>>> may cause problems when accessing it parallelly. >>>>> >>>>> Did you hit any problem? It sounds like it should be fine as is: >>>>> >>>>> The trick is that it's only responsible to flush requests added >>>>> during execution of current call to tctx_task_work(), and those >>>>> naturally synchronised with the current task. All other potentially >>>>> enqueued requests will be of someone else's responsibility. >>>>> >>>>> So, if nobody flushed requests, we're finely in-sync. If we see >>>>> 0 there, but actually enqueued a request, it means someone >>>>> actually flushed it after the request had been added. >>>>> >>>>> Probably, needs a more formal explanation with happens-before >>>>> and so. >>>> I should put more detail in the commit message, the thing is: >>>> say coml_nr > 0 >>>> >>>> ctx_flush_and put other context >>>> if (compl_nr) get mutex >>>> coml_nr > 0 >>>> do flush >>>> coml_nr = 0 >>>> release mutex >>>> get mutex >>>> do flush (*) >>>> release mutex >>>> >>>> in (*) place, we do a bunch of unnecessary works, moreover, we >>> >>> I wouldn't care about overhead, that shouldn't be much >>> >>>> call io_cqring_ev_posted() which I think we shouldn't. >>> >>> IMHO, users should expect spurious io_cqring_ev_posted(), >>> though there were some eventfd users complaining before, so >>> for them we can do >> >> It does sometimes cause issues, see: > > I'm used that locking may end up in spurious wakeups. May be > different for eventfd, but considering that we do batch > completions and so might be calling it only once per multiple > CQEs, it shouldn't be. The wakeups are fine, it's the ev increment that's causing some issues. >> commit b18032bb0a883cd7edd22a7fe6c57e1059b81ed0 >> Author: Jens Axboe <axboe@xxxxxxxxx> >> Date: Sun Jan 24 16:58:56 2021 -0700 >> >> io_uring: only call io_cqring_ev_posted() if events were posted >> >> I would tend to agree with Hao here, and the usual optimization idiom >> looks like: >> >> if (struct->nr) { >> mutex_lock(&struct->lock); >> if (struct->nr) >> do_something(); >> mutex_unlock(&struct->lock); >> } >> >> like you posted, which would be fine and avoid this whole discussion :-) > > Well, until the Hao's message explaining the concerns, I was thinking > it's about potential hangs because of not flushing requests. I'd rather > say the discussion was fruitful and naturally came to the conclusion. Oh for sure, didn't mean to imply it was useless. At least it's in the archives :) -- Jens Axboe