On Mon 19-01-15 09:57:08, Vinayak Menon wrote: > On 01/18/2015 01:18 AM, Christoph Lameter wrote: > >On Sat, 17 Jan 2015, Vinayak Menon wrote: > > > >>which had not updated the vmstat_diff. This CPU was in idle for around 30 > >>secs. When I looked at the tvec base for this CPU, the timer associated with > >>vmstat_update had its expiry time less than current jiffies. This timer had > >>its deferrable flag set, and was tied to the next non-deferrable timer in the > > > >We can remove the deferrrable flag now since the vmstat threads are only > >activated as necessary with the recent changes. Looks like this could fix > >your issue? > > > > Yes, this should fix my issue. Does it? Because I would prefer not getting into un-synced state much more than playing around one specific place which shows the problems right now. > But I think we may need the fix in too_many_isolated, since there can still > be a delay of few seconds (HZ by default and even more because of reasons > pointed out by Michal) which will result in reclaimers unnecessarily > entering congestion_wait. No ? I think we can solve this as well. We can stick vmstat_shepherd into a kernel thread with a loop with the configured timeout and then create a mask of CPUs which need the update and run vmstat_update from IPI context (smp_call_function_many). We would have to drop cond_resched from refresh_cpu_vm_stats of course. The nr_zones x NR_VM_ZONE_STAT_ITEMS in the IPI context shouldn't be excessive but I haven't measured that so I might be easily wrong. Anyway, that should work more reliably than the current scheme and should help to reduce pointless wakeups which the original patchset was addressing. Or am I missing something? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>