On Thu, Sep 05, 2019 at 06:15:43PM -0700, Daniel Colascione wrote: [snip] > > > > > > > The bigger improvement with the threshold is the number of trace records are > > > > > > > almost halved by using a threshold. The number of records went from 4.6K to > > > > > > > 2.6K. > > > > > > > > > > > > Steven, would it be feasible to add a generic tracepoint throttling? > > > > > > > > > > I might misunderstand this but is the issue here actually throttling > > > > > of the sheer number of trace records or tracing large enough changes > > > > > to RSS that user might care about? Small changes happen all the time > > > > > but we are likely not interested in those. Surely we could postprocess > > > > > the traces to extract changes large enough to be interesting but why > > > > > capture uninteresting information in the first place? IOW the > > > > > throttling here should be based not on the time between traces but on > > > > > the amount of change of the traced signal. Maybe a generic facility > > > > > like that would be a good idea? > > > > > > > > You mean like add a trigger (or filter) that only traces if a field has > > > > changed since the last time the trace was hit? Hmm, I think we could > > > > possibly do that. Perhaps even now with histogram triggers? > > > > > > I was thinking along the same lines. The histogram subsystem seems > > > like a very good fit here. Histogram triggers already let users talk > > > about specific fields of trace events, aggregate them in configurable > > > ways, and (importantly, IMHO) create synthetic new trace events that > > > the kernel emits under configurable conditions. > > > > Hmm, I think this tracing feature will be a good idea. But in order not to > > gate this patch, can we agree on keeping a temporary threshold for this > > patch? Once such idea is implemented in trace subsystem, then we can remove > > the temporary filter. > > > > As Tim said, we don't want our traces flooded and this is a very useful > > tracepoint as proven in our internal usage at Android. The threshold filter > > is just few lines of code. > > I'm not sure the threshold filtering code you've added does the right > thing: we don't keep state, so if a counter constantly flips between > one "side" of the TRACE_MM_COUNTER_THRESHOLD and the other, we'll emit > ftrace events at high frequency. More generally, this filtering > couples the rate of counter logging to the *value* of the counter --- > that is, we log ftrace events at different times depending on how much > memory we happen to have used --- and that's not ideal from a > predictability POV. > > All things being equal, I'd prefer that we get things upstream as fast > as possible. But in this case, I'd rather wait for a general-purpose > filtering facility (whether that facility is based on histogram, eBPF, > or something else) rather than hardcode one particular fixed filtering > strategy (which might be suboptimal) for one particular kind of event. > Is there some special urgency here? > > How about we instead add non-filtered tracepoints for the mm counters? > These tracepoints will still be free when turned off. > > Having added the basic tracepoints, we can discuss separately how to > do the rate limiting. Maybe instead of providing direct support for > the algorithm that I described above, we can just use a BPF program as > a yes/no predicate for whether to log to ftrace. That'd get us to the > same place as this patch, but more flexibly, right? Chatted with Daniel offline, we agreed on removing the threshold -- which Michal also wants to be that way. So I'll be resubmitting this patch with the threshold removed; and we'll work on seeing to use filtering through other generic ways like BPF. thanks all! - Joel