On large systems with few hundred CPUs, when applications on each or most of these CPUs read proc/cpuinfo we get an IPI storm and situation gets worse if one of the CPUs can't respond to these IPIs timely. commit f4deaf90212c ('x86/cpu: Avoid cpuinfo-induced IPI pileups') addresses this but in the following call chain: show_cpuinfo | |-- aperfmperf_get_khz | |-- aperfmperf_snapshot_cpu aperfmperf_snapshot_cpu gets invoked with wait=true and this means we endup doing a smp_call_function_single to destination CPU, even if its ->scfpending is set. Avoid this by making sure that even with wait=true, IPI is send only if ->scfpending is not set. Signed-off-by: Imran Khan <imran.f.khan@xxxxxxxxxx> --- I am trying this approach (assuming that its okay) to avoid backporting multiple upstream patches to fix this single issue. Kindly let me know if its okay or would it be better to backport the relevant upstream patches instead. arch/x86/kernel/cpu/aperfmperf.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c index 22911deacb6e4..39fc390cc56af 100644 --- a/arch/x86/kernel/cpu/aperfmperf.c +++ b/arch/x86/kernel/cpu/aperfmperf.c @@ -69,6 +69,7 @@ static void aperfmperf_snapshot_khz(void *dummy) static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait) { + int this_cpu; s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu)); struct aperfmperf_sample *s = per_cpu_ptr(&samples, cpu); @@ -76,8 +77,14 @@ static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait) if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS) return true; - if (!atomic_xchg(&s->scfpending, 1) || wait) + if (!atomic_xchg(&s->scfpending, 1)) smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait); + else if (wait) { + this_cpu = get_cpu(); + while (atomic_read(&s->scfpending)) + cpu_relax(); + put_cpu(); + } /* Return false if the previous iteration was too long ago. */ return time_delta <= APERFMPERF_STALE_THRESHOLD_MS; -- 2.34.1