[PATCH STABLE 5.15.y] Reduce IPI overload when multiple CPUs cat /proc/cpuinfo.

Imran Khan <imran.f.khan@xxxxxxxxxx> · Fri, 14 Apr 2023 12:48:30 +1000

On large systems with few hundred CPUs, when applications on
each or most of these CPUs read proc/cpuinfo we get an IPI
storm and situation gets worse if one of the CPUs can't respond
to these IPIs timely.

commit f4deaf90212c ('x86/cpu: Avoid cpuinfo-induced IPI pileups')
addresses this but in the following call chain:

show_cpuinfo
    |
    |-- aperfmperf_get_khz
                |
                |-- aperfmperf_snapshot_cpu

aperfmperf_snapshot_cpu gets invoked with wait=true and this means
we endup doing a smp_call_function_single to destination CPU, even
if its ->scfpending is set.

Avoid this by making sure that even with wait=true, IPI is send only
if ->scfpending is not set.

Signed-off-by: Imran Khan <imran.f.khan@xxxxxxxxxx>
---

I am trying this approach (assuming that its okay) to avoid backporting
multiple upstream patches to fix this single issue. Kindly let me know if
its okay or would it be better to backport the relevant upstream patches
instead.

 arch/x86/kernel/cpu/aperfmperf.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 22911deacb6e4..39fc390cc56af 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -69,6 +69,7 @@ static void aperfmperf_snapshot_khz(void *dummy)
 
 static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 {
+	int this_cpu;
 	s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu));
 	struct aperfmperf_sample *s = per_cpu_ptr(&samples, cpu);
 
@@ -76,8 +77,14 @@ static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 	if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
 		return true;
 
-	if (!atomic_xchg(&s->scfpending, 1) || wait)
+	if (!atomic_xchg(&s->scfpending, 1))
 		smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+	else if (wait) {
+		this_cpu = get_cpu();
+		while (atomic_read(&s->scfpending))
+			cpu_relax();
+		put_cpu();
+	}
 
 	/* Return false if the previous iteration was too long ago. */
 	return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
-- 
2.34.1