On 03/05/2024 14:45, Vlastimil Babka wrote: > On 5/3/24 11:16 AM, Ryan Roberts wrote: >> On 03/05/2024 03:09, Barry Song wrote: >>> @@ -83,8 +83,6 @@ static inline void count_vm_events(enum vm_event_item item, long delta) >>> >>> extern void all_vm_events(unsigned long *); >>> >>> -extern void vm_events_fold_cpu(int cpu); >>> - >>> #else >>> >>> /* Disable counters */ >>> @@ -103,9 +101,6 @@ static inline void __count_vm_events(enum vm_event_item item, long delta) >>> static inline void all_vm_events(unsigned long *ret) >>> { >>> } >>> -static inline void vm_events_fold_cpu(int cpu) >>> -{ >>> -} >>> >>> #endif /* CONFIG_VM_EVENT_COUNTERS */ >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index cd584aace6bf..8b56d785d587 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -5826,14 +5826,6 @@ static int page_alloc_cpu_dead(unsigned int cpu) >>> mlock_drain_remote(cpu); >>> drain_pages(cpu); >>> >>> - /* >>> - * Spill the event counters of the dead processor >>> - * into the current processors event counters. >>> - * This artificially elevates the count of the current >>> - * processor. >>> - */ >>> - vm_events_fold_cpu(cpu); >>> - >>> /* >>> * Zero the differential counters of the dead processor >>> * so that the vm statistics are consistent. >>> diff --git a/mm/vmstat.c b/mm/vmstat.c >>> index db79935e4a54..aaa32418652e 100644 >>> --- a/mm/vmstat.c >>> +++ b/mm/vmstat.c >>> @@ -114,7 +114,7 @@ static void sum_vm_events(unsigned long *ret) >>> >>> memset(ret, 0, NR_VM_EVENT_ITEMS * sizeof(unsigned long)); >>> >>> - for_each_online_cpu(cpu) { >>> + for_each_possible_cpu(cpu) { >> >> One thought comes to mind (due to my lack of understanding exactly what >> "possible" means): Linux is compiled with a max number of cpus - NR_CPUS - 512 >> for arm64's defconfig. Does all possible cpus include all 512? On an 8 CPU >> system that would be increasing the number of loops by 64 times. >> >> Or perhaps possible just means CPUs that have ever been online? > > IIRC on x86 it comes from some BIOS tables, and some bioses might be not > providing very realistic numbers, so it can be unnecessary large. OK thanks for the info. > >> Either way, I guess it's not considered a performance bottleneck because, from >> memory, the scheduler and many other places are iterating over all possible cpus. > > I doubt anything does it in a fastpath. But this affects only reading > /proc/vmstat, right? Which is not a fastpath. Also update_balloon_stats() > which is probably ok as well? Yep agreed. > > Either way I don't see a clear advantage nor disadvantage of this. The advantage is just that it deletes 32 lines of code and makes it easier to understand. > >>> struct vm_event_state *this = &per_cpu(vm_event_states, cpu); >>> >>> for (i = 0; i < NR_VM_EVENT_ITEMS; i++) >>> @@ -129,29 +129,10 @@ static void sum_vm_events(unsigned long *ret) >>> */ >>> void all_vm_events(unsigned long *ret) >>> { >>> - cpus_read_lock(); >>> sum_vm_events(ret); >>> - cpus_read_unlock(); >>> } >>> EXPORT_SYMBOL_GPL(all_vm_events); >>> >>> -/* >>> - * Fold the foreign cpu events into our own. >>> - * >>> - * This is adding to the events on one processor >>> - * but keeps the global counts constant. >>> - */ >>> -void vm_events_fold_cpu(int cpu) >>> -{ >>> - struct vm_event_state *fold_state = &per_cpu(vm_event_states, cpu); >>> - int i; >>> - >>> - for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { >>> - count_vm_events(i, fold_state->event[i]); >>> - fold_state->event[i] = 0; >>> - } >>> -} >>> - >>> #endif /* CONFIG_VM_EVENT_COUNTERS */ >>> >>> /* >> >