On Wed, Nov 20, 2019 at 05:03:13PM -0500, Phil Auld wrote: > On Wed, Nov 20, 2019 at 08:16:36PM +0100 Peter Zijlstra wrote: > > On Tue, Nov 19, 2019 at 07:40:54AM +1100, Dave Chinner wrote: > > > Yes, that's precisely the problem - work is queued, by default, on a > > > specific CPU and it will wait for a kworker that is pinned to that > > > > I'm thinking the problem is that it doesn't wait. If it went and waited > > for it, active balance wouldn't be needed, that only works on active > > tasks. > > Since this is AIO I wonder if it should queue_work on a nearby cpu by > default instead of unbound. The thing seems to be that 'unbound' is in fact 'bound'. Maybe we should fix that. If the load-balancer were allowed to move the kworker around when it didn't get time to run, that would probably be a better solution. Picking another 'bound' cpu by random might create the same sort of problems in more complicated scenarios. TJ, ISTR there used to be actually unbound kworkers, what happened to those? or am I misremembering things. > > Lastly, > > one other thing to try is -next. Vincent reworked the load-balancer > > quite a bit. > > > > I've tried it with the lb patch series. I get basically the same results. > With the high granularity settings I get 3700 migrations for the 30 > second run at 4k. Of those about 3200 are active balance on stock 5.4-rc7. > With the lb patches it's 3500 and 3000, a slight drop. Thanks for testing that. I didn't expect miracles, but it is good to verify. > Using the default granularity settings 50 and 22 for stock and 250 and 25. > So a few more total migrations with the lb patches but about the same active. Right, so the granularity thing interacts with the load-balance period. By pushing it up, as some people appear to do, makes it so that what might be a temporal imablance is perceived as a persitent imbalance. Tying the load-balance period to the gramularity is something we could consider, but then I'm sure, we'll get other people complaining the doesn't balance quick enough anymore.