On Mon, May 13, 2019 at 08:12:37AM +0200, Daniel Wagner wrote: > From: Michal Hocko <mhocko@xxxxxxxx> > > [ Upstream commit f01f17d3705bb6081c9e5728078f64067982be36 ] > > Mike has reported a considerable overhead of refresh_cpu_vm_stats from > the idle entry during pipe test: > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > 4.75% [kernel] [k] __schedule > 4.70% [kernel] [k] mutex_unlock > 3.14% [kernel] [k] __switch_to > > This is caused by commit 0eb77e988032 ("vmstat: make vmstat_updater > deferrable again and shut down on idle") which has placed quiet_vmstat > into cpu_idle_loop. The main reason here seems to be that the idle > entry has to get over all zones and perform atomic operations for each > vmstat entry even though there might be no per cpu diffs. This is a > pointless overhead for _each_ idle entry. > > Make sure that quiet_vmstat is as light as possible. > > First of all it doesn't make any sense to do any local sync if the > current cpu is already set in oncpu_stat_off because vmstat_update puts > itself there only if there is nothing to do. > > Then we can check need_update which should be a cheap way to check for > potential per-cpu diffs and only then do refresh_cpu_vm_stats. > > The original patch also did cancel_delayed_work which we are not doing > here. There are two reasons for that. Firstly cancel_delayed_work from > idle context will blow up on RT kernels (reported by Mike): > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 > Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 > Call Trace: > dump_stack+0x49/0x67 > ___might_sleep+0xf5/0x180 > rt_spin_lock+0x20/0x50 > try_to_grab_pending+0x69/0x240 > cancel_delayed_work+0x26/0xe0 > quiet_vmstat+0x75/0xa0 > cpu_idle_loop+0x38/0x3e0 > cpu_startup_entry+0x13/0x20 > start_secondary+0x114/0x140 > > And secondly, even on !RT kernels it might add some non trivial overhead > which is not necessary. Even if the vmstat worker wakes up and preempts > idle then it will be most likely a single shot noop because the stats > were already synced and so it would end up on the oncpu_stat_off anyway. > We just need to teach both vmstat_shepherd and vmstat_update to stop > scheduling the worker if there is nothing to do. > > [mgalbraith@xxxxxxx: cancel pending work of the cpu_stat_off CPU] > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > Reported-by: Mike Galbraith <umgwanakikbuti@xxxxxxxxx> > Acked-by: Christoph Lameter <cl@xxxxxxxxx> > Signed-off-by: Mike Galbraith <mgalbraith@xxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Daniel Wagner <wagi@xxxxxxxxx> > --- > Hi Greg, > > Upstream commmit 0eb77e988032 ("vmstat: make vmstat_updater deferrable > again and shut down on idle") was back ported in v4.4.178 > (bdf3c006b9a2). For -rt we definitely need the bugfix f01f17d3705b > ("mm, vmstat: make quiet_vmstat lighter") as well. > > Since the offending patch was back ported to v4.4 stable only, the > other stable branches don't need an update (offending patch and bug > fix are already in). > > Could you please queue the above patch for v4.4.y? Now queued up, thanks. greg k-h