On Tue, Sep 3, 2019 at 10:02 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > On Tue, Sep 03, 2019 at 09:44:51PM -0700, Suren Baghdasaryan wrote: > > On Tue, Sep 3, 2019 at 1:09 PM Joel Fernandes (Google) > > <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > Useful to track how RSS is changing per TGID to detect spikes in RSS and > > > memory hogs. Several Android teams have been using this patch in various > > > kernel trees for half a year now. Many reported to me it is really > > > useful so I'm posting it upstream. > > > > > > Initial patch developed by Tim Murray. Changes I made from original patch: > > > o Prevent any additional space consumed by mm_struct. > > > o Keep overhead low by checking if tracing is enabled. > > > o Add some noise reduction and lower overhead by emitting only on > > > threshold changes. > > > > > > Co-developed-by: Tim Murray <timmurray@xxxxxxxxxx> > > > Signed-off-by: Tim Murray <timmurray@xxxxxxxxxx> > > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > > > > > > --- > > > > > > v1->v2: Added more commit message. > > > > > > Cc: carmenjackson@xxxxxxxxxx > > > Cc: mayankgupta@xxxxxxxxxx > > > Cc: dancol@xxxxxxxxxx > > > Cc: rostedt@xxxxxxxxxxx > > > Cc: minchan@xxxxxxxxxx > > > Cc: akpm@xxxxxxxxxxxxxxxxxxxx > > > Cc: kernel-team@xxxxxxxxxxx > > > > > > include/linux/mm.h | 14 +++++++++++--- > > > include/trace/events/kmem.h | 21 +++++++++++++++++++++ > > > mm/memory.c | 20 ++++++++++++++++++++ > > > 3 files changed, 52 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > > index 0334ca97c584..823aaf759bdb 100644 > > > --- a/include/linux/mm.h > > > +++ b/include/linux/mm.h > > > @@ -1671,19 +1671,27 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member) > > > return (unsigned long)val; > > > } > > > > > > +void mm_trace_rss_stat(int member, long count, long value); > > > + > > > static inline void add_mm_counter(struct mm_struct *mm, int member, long value) > > > { > > > - atomic_long_add(value, &mm->rss_stat.count[member]); > > > + long count = atomic_long_add_return(value, &mm->rss_stat.count[member]); > > > + > > > + mm_trace_rss_stat(member, count, value); > > > } > > > > > > static inline void inc_mm_counter(struct mm_struct *mm, int member) > > > { > > > - atomic_long_inc(&mm->rss_stat.count[member]); > > > + long count = atomic_long_inc_return(&mm->rss_stat.count[member]); > > > + > > > + mm_trace_rss_stat(member, count, 1); > > > } > > > > > > static inline void dec_mm_counter(struct mm_struct *mm, int member) > > > { > > > - atomic_long_dec(&mm->rss_stat.count[member]); > > > + long count = atomic_long_dec_return(&mm->rss_stat.count[member]); > > > + > > > + mm_trace_rss_stat(member, count, -1); > > > } > > > > > > /* Optimized variant when page is already known not to be PageAnon */ > > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > > > index eb57e3037deb..8b88e04fafbf 100644 > > > --- a/include/trace/events/kmem.h > > > +++ b/include/trace/events/kmem.h > > > @@ -315,6 +315,27 @@ TRACE_EVENT(mm_page_alloc_extfrag, > > > __entry->change_ownership) > > > ); > > > > > > +TRACE_EVENT(rss_stat, > > > + > > > + TP_PROTO(int member, > > > + long count), > > > + > > > + TP_ARGS(member, count), > > > + > > > + TP_STRUCT__entry( > > > + __field(int, member) > > > + __field(long, size) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->member = member; > > > + __entry->size = (count << PAGE_SHIFT); > > > + ), > > > + > > > + TP_printk("member=%d size=%ldB", > > > + __entry->member, > > > + __entry->size) > > > + ); > > > #endif /* _TRACE_KMEM_H */ > > > > > > /* This part must be outside protection */ > > > diff --git a/mm/memory.c b/mm/memory.c > > > index e2bb51b6242e..9d81322c24a3 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -72,6 +72,8 @@ > > > #include <linux/oom.h> > > > #include <linux/numa.h> > > > > > > +#include <trace/events/kmem.h> > > > + > > > #include <asm/io.h> > > > #include <asm/mmu_context.h> > > > #include <asm/pgalloc.h> > > > @@ -140,6 +142,24 @@ static int __init init_zero_pfn(void) > > > } > > > core_initcall(init_zero_pfn); > > > > > > +/* > > > + * This threshold is the boundary in the value space, that the counter has to > > > + * advance before we trace it. Should be a power of 2. It is to reduce unwanted > > > + * trace overhead. The counter is in units of number of pages. > > > + */ > > > +#define TRACE_MM_COUNTER_THRESHOLD 128 > > > > IIUC the counter has to change by 128 pages (512kB assuming 4kB pages) > > before the change gets traced. Would it make sense to make this step > > size configurable? For a system with limited memory size change of > > 512kB might be considerable while on systems with plenty of memory > > that might be negligible. Not even mentioning possible difference in > > page sizes. Maybe something like > > /sys/kernel/debug/tracing/rss_step_order with > > TRACE_MM_COUNTER_THRESHOLD=(1<<rss_step_order)? > > I would not want to complicate this more to be honest. It is already a bit > complex, and I am not sure about the win in making it as configurable as you > seem to want. The "threshold" thing is just a slight improvement, it is not > aiming to be optimal. If in your tracing, this granularity is an issue, we > can visit it then. I guess that can be done as a separate patch later on if necessary. > > thanks, > > - Joel > > > > > > +void mm_trace_rss_stat(int member, long count, long value) > > > +{ > > > + long thresh_mask = ~(TRACE_MM_COUNTER_THRESHOLD - 1); > > > + > > > + if (!trace_rss_stat_enabled()) > > > + return; > > > + > > > + /* Threshold roll-over, trace it */ > > > + if ((count & thresh_mask) != ((count - value) & thresh_mask)) > > > + trace_rss_stat(member, count); > > > +} > > > > > > #if defined(SPLIT_RSS_COUNTING) > > > > > > -- > > > 2.23.0.187.g17f5b7556c-goog > > > > > > -- > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx. > > > > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx. >