On 8/20/21 11:46 PM, Jens Axboe wrote: > On 8/20/21 4:41 PM, Pavel Begunkov wrote: >> On 8/20/21 11:30 PM, Jens Axboe wrote: >>> On 8/20/21 4:28 PM, Pavel Begunkov wrote: >>>> On 8/20/21 11:09 PM, Jens Axboe wrote: >>>>> On 8/20/21 3:32 PM, Pavel Begunkov wrote: >>>>>> On 8/20/21 9:39 PM, Hao Xu wrote: >>>>>>> 在 2021/8/21 上午2:59, Pavel Begunkov 写道: >>>>>>>> On 8/20/21 7:40 PM, Hao Xu wrote: >>>>>>>>> coml_nr in ctx_flush_and_put() is not protected by uring_lock, this >>>>>>>>> may cause problems when accessing it parallelly. >>>>>>>> >>>>>>>> Did you hit any problem? It sounds like it should be fine as is: >>>>>>>> >>>>>>>> The trick is that it's only responsible to flush requests added >>>>>>>> during execution of current call to tctx_task_work(), and those >>>>>>>> naturally synchronised with the current task. All other potentially >>>>>>>> enqueued requests will be of someone else's responsibility. >>>>>>>> >>>>>>>> So, if nobody flushed requests, we're finely in-sync. If we see >>>>>>>> 0 there, but actually enqueued a request, it means someone >>>>>>>> actually flushed it after the request had been added. >>>>>>>> >>>>>>>> Probably, needs a more formal explanation with happens-before >>>>>>>> and so. >>>>>>> I should put more detail in the commit message, the thing is: >>>>>>> say coml_nr > 0 >>>>>>> >>>>>>> ctx_flush_and put other context >>>>>>> if (compl_nr) get mutex >>>>>>> coml_nr > 0 >>>>>>> do flush >>>>>>> coml_nr = 0 >>>>>>> release mutex >>>>>>> get mutex >>>>>>> do flush (*) >>>>>>> release mutex >>>>>>> >>>>>>> in (*) place, we do a bunch of unnecessary works, moreover, we >>>>>> >>>>>> I wouldn't care about overhead, that shouldn't be much >>>>>> >>>>>>> call io_cqring_ev_posted() which I think we shouldn't. >>>>>> >>>>>> IMHO, users should expect spurious io_cqring_ev_posted(), >>>>>> though there were some eventfd users complaining before, so >>>>>> for them we can do >>>>> >>>>> It does sometimes cause issues, see: >>>> >>>> I'm used that locking may end up in spurious wakeups. May be >>>> different for eventfd, but considering that we do batch >>>> completions and so might be calling it only once per multiple >>>> CQEs, it shouldn't be. >>> >>> The wakeups are fine, it's the ev increment that's causing some issues. >> >> If userspace doesn't expect that eventfd may get diverged from the >> number of posted CQEs, we need something like below. The weird part >> is that it looks nobody complained about this one, even though it >> should be happening pretty often. > > That wasn't the issue we ran into, it was more the fact that eventfd > would indicate that something had been posted, when nothing had. > We don't need eventfd notifications to be == number of posted events, > just if the eventfd notification is inremented, there should be new > events there. It's just so commonly mentioned, that for me expecting spurious events/wakeups is a default. Do we have it documented anywhere? -- Pavel Begunkov