On Fri, 16 Jun 2023 20:07:18 +0200 Michal Koutný <mkoutny@xxxxxxxx> wrote: > An issue was observed with stats collected in struct rusage on ppc64le > with 64kB pages. The percpu counters use batching with > percpu_counter_batch = max(32, nr*2) # in PAGE_SIZE > i.e. with larger pages but similar RSS consumption (bytes), there'll be > less flushes and error more noticeable. A fully detailed description of the issue would be helpful. Obviously "inaccuracy", but how bad? > In this given case (getting consumption of exited child), we can request > percpu counter's flush without worrying about contention with updaters. > > Fortunately, the commit f1a7941243c1 ("mm: convert mm's rss stats into > percpu_counter") didn't eradicate all traces of SPLIT_RSS_COUNTING and > this mechanism already provided some synchronization points before > reading stats. > Therefore, use sync_mm_rss as carrier for percpu counters refreshes and > forget SPLIT_RSS_COUNTING macro for good. > > Impact of summing on a 8 CPU machine: > Benchmark 1: taskset -c 1 ./shell-bench.sh > > Before > Time (mean ± σ): 9.950 s ± 0.052 s [User: 7.773 s, System: 2.023 s] > > After > Time (mean ± σ): 9.990 s ± 0.070 s [User: 7.825 s, System: 2.011 s] > > cat >shell-bench.sh <<EOD > for (( i = 0; i < 20000; i++ )); do > /bin/true > done > EOD > > The script is meant to stress fork-exit path (exit is where sync_mm_rss > is most called, add_mm_rss_vec should be covered in fork). > > ... > > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2547,13 +2547,12 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss, > *maxrss = hiwater_rss; > } > > -#if defined(SPLIT_RSS_COUNTING) > -void sync_mm_rss(struct mm_struct *mm); > -#else > static inline void sync_mm_rss(struct mm_struct *mm) > { > + for (int i = 0; i < NR_MM_COUNTERS; ++i) > + percpu_counter_set(&mm->rss_stat[i], > + percpu_counter_sum(&mm->rss_stat[i])); > } > -#endif Far too large to be inlined! For six callsites it adds 1kb of text. Why even modify the counter? Can't <whatever this issue is> be addressed by using percpu_counter_sum() in an appropriate place? For unknown reasons percpu_counter_set() uses for_each_possible_cpu(). Probably just a mistake - percpu_counters are hotplug-aware and for_each_online_cpu should suffice. I'm really not liking percpu_counter_set(). It's only safe in situations where the caller knows that no other CPU can be modifying the counter. I wonder if all the callers know that. This situation isn't aided by the lack of any documentation.