On Thu, Nov 21, 2019 at 02:29:37PM +0100 Peter Zijlstra wrote: > On Wed, Nov 20, 2019 at 05:03:13PM -0500, Phil Auld wrote: > > On Wed, Nov 20, 2019 at 08:16:36PM +0100 Peter Zijlstra wrote: > > > On Tue, Nov 19, 2019 at 07:40:54AM +1100, Dave Chinner wrote: > > > > > Yes, that's precisely the problem - work is queued, by default, on a > > > > specific CPU and it will wait for a kworker that is pinned to that > > > > > > I'm thinking the problem is that it doesn't wait. If it went and waited > > > for it, active balance wouldn't be needed, that only works on active > > > tasks. > > > > Since this is AIO I wonder if it should queue_work on a nearby cpu by > > default instead of unbound. > > The thing seems to be that 'unbound' is in fact 'bound'. Maybe we should > fix that. If the load-balancer were allowed to move the kworker around > when it didn't get time to run, that would probably be a better > solution. > Yeah, I'm not convinced this is actually a scheduler issue. > Picking another 'bound' cpu by random might create the same sort of > problems in more complicated scenarios. > > TJ, ISTR there used to be actually unbound kworkers, what happened to > those? or am I misremembering things. > > > > Lastly, > > > one other thing to try is -next. Vincent reworked the load-balancer > > > quite a bit. > > > > > > > I've tried it with the lb patch series. I get basically the same results. > > With the high granularity settings I get 3700 migrations for the 30 > > second run at 4k. Of those about 3200 are active balance on stock 5.4-rc7. > > With the lb patches it's 3500 and 3000, a slight drop. > > Thanks for testing that. I didn't expect miracles, but it is good to > verify. > > > Using the default granularity settings 50 and 22 for stock and 250 and 25. > > So a few more total migrations with the lb patches but about the same active. > > Right, so the granularity thing interacts with the load-balance period. > By pushing it up, as some people appear to do, makes it so that what > might be a temporal imablance is perceived as a persitent imbalance. > > Tying the load-balance period to the gramularity is something we could > consider, but then I'm sure, we'll get other people complaining the > doesn't balance quick enough anymore. > Thanks. These are old tuned settings that have been carried along. They may not be right for newer kernels anyway. --