On 03/05/2024 03:09, Barry Song wrote: > From: Barry Song <v-songbaohua@xxxxxxxx> > > When unplugging a CPU, the current code merges its vm_events > with an online CPU. Because, during summation, it only considers > online CPUs, which is a crude workaround. By transitioning to > summing up all possible CPUs, we can eliminate the need for > vm_events_fold_cpu. > > Suggested-by: Ryan Roberts <ryan.roberts@xxxxxxx> > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > --- > originally suggested by Ryan while he reviewed mTHP counters > patchset[1]; I am also applying this suggestion to vm_events > > -v2: > also drop cpus_read_lock() as we don't care about cpu hotplug any more; > -v1: > https://lore.kernel.org/linux-mm/20240412123039.442743-1-21cnbao@xxxxxxxxx/ > > [1] https://lore.kernel.org/linux-mm/ca73cbf1-8304-4790-a721-3c3a42f9d293@xxxxxxx/ > > include/linux/vmstat.h | 5 ----- > mm/page_alloc.c | 8 -------- > mm/vmstat.c | 21 +-------------------- > 3 files changed, 1 insertion(+), 33 deletions(-) > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > index 735eae6e272c..f7eaeb8bfa47 100644 > --- a/include/linux/vmstat.h > +++ b/include/linux/vmstat.h > @@ -83,8 +83,6 @@ static inline void count_vm_events(enum vm_event_item item, long delta) > > extern void all_vm_events(unsigned long *); > > -extern void vm_events_fold_cpu(int cpu); > - > #else > > /* Disable counters */ > @@ -103,9 +101,6 @@ static inline void __count_vm_events(enum vm_event_item item, long delta) > static inline void all_vm_events(unsigned long *ret) > { > } > -static inline void vm_events_fold_cpu(int cpu) > -{ > -} > > #endif /* CONFIG_VM_EVENT_COUNTERS */ > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cd584aace6bf..8b56d785d587 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5826,14 +5826,6 @@ static int page_alloc_cpu_dead(unsigned int cpu) > mlock_drain_remote(cpu); > drain_pages(cpu); > > - /* > - * Spill the event counters of the dead processor > - * into the current processors event counters. > - * This artificially elevates the count of the current > - * processor. > - */ > - vm_events_fold_cpu(cpu); > - > /* > * Zero the differential counters of the dead processor > * so that the vm statistics are consistent. > diff --git a/mm/vmstat.c b/mm/vmstat.c > index db79935e4a54..aaa32418652e 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -114,7 +114,7 @@ static void sum_vm_events(unsigned long *ret) > > memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long)); > > - for_each_online_cpu(cpu) { > + for_each_possible_cpu(cpu) { One thought comes to mind (due to my lack of understanding exactly what "possible" means): Linux is compiled with a max number of cpus - NR_CPUS - 512 for arm64's defconfig. Does all possible cpus include all 512? On an 8 CPU system that would be increasing the number of loops by 64 times. Or perhaps possible just means CPUs that have ever been online? Either way, I guess it's not considered a performance bottleneck because, from memory, the scheduler and many other places are iterating over all possible cpus. > struct vm_event_state *this = &per_cpu(vm_event_states, cpu); > > for (i = 0; i < NR_VM_EVENT_ITEMS; i++) > @@ -129,29 +129,10 @@ static void sum_vm_events(unsigned long *ret) > */ > void all_vm_events(unsigned long *ret) > { > - cpus_read_lock(); > sum_vm_events(ret); > - cpus_read_unlock(); > } > EXPORT_SYMBOL_GPL(all_vm_events); > > -/* > - * Fold the foreign cpu events into our own. > - * > - * This is adding to the events on one processor > - * but keeps the global counts constant. > - */ > -void vm_events_fold_cpu(int cpu) > -{ > - struct vm_event_state *fold_state = &per_cpu(vm_event_states, cpu); > - int i; > - > - for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { > - count_vm_events(i, fold_state->event[i]); > - fold_state->event[i] = 0; > - } > -} > - > #endif /* CONFIG_VM_EVENT_COUNTERS */ > > /*