On 8/20/21 4:59 PM, Pavel Begunkov wrote: > On 8/20/21 11:46 PM, Jens Axboe wrote: >> On 8/20/21 4:41 PM, Pavel Begunkov wrote: >>> On 8/20/21 11:30 PM, Jens Axboe wrote: >>>> On 8/20/21 4:28 PM, Pavel Begunkov wrote: >>>>> On 8/20/21 11:09 PM, Jens Axboe wrote: >>>>>> On 8/20/21 3:32 PM, Pavel Begunkov wrote: >>>>>>> On 8/20/21 9:39 PM, Hao Xu wrote: >>>>>>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道: >>>>>>>>> On 8/20/21 7:40 PM, Hao Xu wrote: >>>>>>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this >>>>>>>>>> may cause problems when accessing it parallelly. >>>>>>>>> >>>>>>>>> Did you hit any problem? It sounds like it should be fine as is: >>>>>>>>> >>>>>>>>> The trick is that it's only responsible to flush requests added >>>>>>>>> during execution of current call to tctx_task_work(), and those >>>>>>>>> naturally synchronised with the current task. All other potentially >>>>>>>>> enqueued requests will be of someone else's responsibility. >>>>>>>>> >>>>>>>>> So, if nobody flushed requests, we're finely in-sync. If we see >>>>>>>>> 0 there, but actually enqueued a request, it means someone >>>>>>>>> actually flushed it after the request had been added. >>>>>>>>> >>>>>>>>> Probably, needs a more formal explanation with happens-before >>>>>>>>> and so. >>>>>>>> I should put more detail in the commit message, the thing is: >>>>>>>> say coml_nr > 0 >>>>>>>> >>>>>>>> ctx_flush_and put other context >>>>>>>> if (compl_nr) get mutex >>>>>>>> coml_nr > 0 >>>>>>>> do flush >>>>>>>> coml_nr = 0 >>>>>>>> release mutex >>>>>>>> get mutex >>>>>>>> do flush (*) >>>>>>>> release mutex >>>>>>>> >>>>>>>> in (*) place, we do a bunch of unnecessary works, moreover, we >>>>>>> >>>>>>> I wouldn't care about overhead, that shouldn't be much >>>>>>> >>>>>>>> call io_cqring_ev_posted() which I think we shouldn't. >>>>>>> >>>>>>> IMHO, users should expect spurious io_cqring_ev_posted(), >>>>>>> though there were some eventfd users complaining before, so >>>>>>> for them we can do >>>>>> >>>>>> It does sometimes cause issues, see: >>>>> >>>>> I'm used that locking may end up in spurious wakeups. May be >>>>> different for eventfd, but considering that we do batch >>>>> completions and so might be calling it only once per multiple >>>>> CQEs, it shouldn't be. >>>> >>>> The wakeups are fine, it's the ev increment that's causing some issues. >>> >>> If userspace doesn't expect that eventfd may get diverged from the >>> number of posted CQEs, we need something like below. The weird part >>> is that it looks nobody complained about this one, even though it >>> should be happening pretty often. >> >> That wasn't the issue we ran into, it was more the fact that eventfd >> would indicate that something had been posted, when nothing had. >> We don't need eventfd notifications to be == number of posted events, >> just if the eventfd notification is inremented, there should be new >> events there. > > It's just so commonly mentioned, that for me expecting spurious > events/wakeups is a default. Do we have it documented anywhere? Not documented to my knowledge, and I wasn't really aware of this being a problem until it was reported and that above referenced commit was done to fix it. Might be worthwhile to put a comment in ev_posted() to detail this, I'll do that for 5.15. -- Jens Axboe