On Thu, Sep 5, 2019 at 9:03 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > I might misunderstand this but is the issue here actually throttling > of the sheer number of trace records or tracing large enough changes > to RSS that user might care about? Small changes happen all the time > but we are likely not interested in those. Surely we could postprocess > the traces to extract changes large enough to be interesting but why > capture uninteresting information in the first place? IOW the > throttling here should be based not on the time between traces but on > the amount of change of the traced signal. Maybe a generic facility > like that would be a good idea? You want two properties from the tracepoint: - Small fluctuations in the value don't flood the trace buffer. If you get a new trace event from a process every time kswapd reclaims a single page from that process, you're going to need an enormous trace buffer that will have significant side effects on overall system performance. - Any spike in memory consumption gets a trace event, regardless of the duration of that spike. This tracepoint has been incredibly useful in both understanding the causes of kswapd wakeups and lowmemorykiller/lmkd kills and evaluating the impact of memory management changes because it guarantees that any spike appears in the trace output. As a result, the RSS tracepoint in particular needs to be throttled based on the delta of the value, not time. The very first prototype of the patch emitted a trace event per RSS counter change, and IIRC the RSS trace events consumed significantly more room in the buffer than sched_switch (and Android has a lot of sched_switch events). It's not practical to trace changes in RSS without throttling. If there's a generic throttling approach that would work here, I'm all for it; like Dan mentioned, there are many more counters that we would like to trace in a similar way.