On 11/21/19 8:02 AM, Boaz Harrosh wrote: > On 21/11/2019 16:12, Phil Auld wrote: > <> >> >> The scheduler doesn't know if the queued_work submitter is going to go to sleep. >> That's why I was singling out AIO. My understanding of it is that you submit the IO >> and then keep going. So in that case it might be better to pick a node-local nearby >> cpu instead. But this is a user of work queue issue not a scheduler issue. >> > > We have a very similar long standing problem in our system (zufs), that we had to do > hacks to fix. > > We have seen these CPU bouncing exacly as above in fio and more > benchmarks, Our final analysis was: > > One thread is in wait_event() if the wake_up() is on the same CPU as > the waiter, on some systems usually real HW and not VMs, would bounce > to a different CPU. Now our system has an array of worker-threads > bound to each CPU. an incoming thread chooses a corresponding cpu > worker-thread, let it run, waiting for a reply, then when the > worker-thread is done it will do a wake_up(). Usually its fine and the > wait_event() stays on the same CPU. But on some systems it will wakeup > in a different CPU. > > Now this is a great pity because in our case and the work_queue case > and high % of places the thread calling wake_up() will then > immediately go to sleep on something. (Work done lets wait for new > work) > > I wish there was a flag to wake_up() or to the event object that says > to relinquish the remaning of the time-slice to the waiter on same > CPU, since I will be soon sleeping. Isn't that basically what wake_up_sync() is? -- Jens Axboe