On Tue, Dec 03, 2019 at 10:45:38AM +0100, Vincent Guittot wrote: > On Mon, 2 Dec 2019 at 22:22, Phil Auld <pauld@xxxxxxxxxx> wrote: > > > > Hi Vincent, > > > > On Mon, Dec 02, 2019 at 02:45:42PM +0100 Vincent Guittot wrote: > > > On Mon, 2 Dec 2019 at 05:02, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > ... > > > > > > So, we can fiddle with workqueues, but it doesn't address the > > > > underlying issue that the scheduler appears to be migrating > > > > non-bound tasks off a busy CPU too easily.... > > > > > > The root cause of the problem is that the sched_wakeup_granularity_ns > > > is in the same range or higher than load balance period. As Peter > > > explained, This make the kworker waiting for the CPU for several load > > > period and a transient unbalanced state becomes a stable one that the > > > scheduler to fix. With default value, the scheduler doesn't try to > > > migrate any task. > > > > There are actually two issues here. With the high wakeup granularity > > we get the user task actively migrated. This causes the significant > > performance hit Ming was showing. With the fast wakeup_granularity > > (or smaller IOs - 512 instead of 4k) we get, instead, the user task > > migrated at wakeup to a new CPU for every IO completion. > > Ok, I haven't noticed that this one was a problem too. Do we have perf > regression ? Follows the test result on one server(Dell, R630: Haswell-E): kernel.sched_wakeup_granularity_ns = 4000000 kernel.sched_min_granularity_ns = 3000000 --------------------------------------- test | IOPS --------------------------------------- ./xfs_complete 512 | 7.8K --------------------------------------- taskset -c 8 ./xfs_complete 512 | 9.8K --------------------------------------- Thanks, Ming