On Thu, Jul 6, 2017 at 1:16 PM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > I'm still struggling to see how counters help when an agent that monitors > for high CPU usage could be activated > I suspect Roman has the same problem set as us, the CPU usage is either always high, high and service critical likely when something interesting is happening. We'd like to collect data on 200k machines, and study the results statistically and with respect to time based on kernel versions, build configs, hardware types, process types, load patterns, etc, etc. Even finding good candidate machines and at the right time of day to manually debug with ftrace is problematic. Granted we could be utilizing existing counters like compact_fail better. Ultimately the data either leads to dealing with certain bad actors, different vm tunings, or patches to mm. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>