Re: [PATCH v2 4/3] smp: print only local CPU info when sched_clock goes backward

Leonardo Bras <leobras@xxxxxxxxxx> · Tue, 6 Aug 2024 03:09:11 -0300

On Wed, Jul 24, 2024 at 10:36:52PM +0530, neeraj.upadhyay@xxxxxxxxxx wrote:
> From: Rik van Riel <riel@xxxxxxxxxxx>
> 
> About 40% of all csd_lock warnings observed in our fleet appear to
> be due to sched_clock() going backward in time (usually only a little
> bit), resulting in ts0 being larger than ts2.
> 
> When the local CPU is at fault, we should print out a message reflecting
> that, rather than trying to get the remote CPU's stack trace.
> 
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Tested-by: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@xxxxxxxxxx>
> ---
>  kernel/smp.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index dfcde438ef63..143ae26f96a2 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -253,6 +253,14 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
>  		   csd_lock_timeout_ns == 0))
>  		return false;
>

Hello Rik,
Thanks for this contribution!
Seems a really nice bug you found here :)

Your solution seems very efficient, but I noticed a few (possible) issues here:
  
> +	if (ts0 > ts2) {
> +		/* Our own sched_clock went backward; don't blame another CPU. */
> +		ts_delta = ts0 - ts2;
> +		pr_alert("sched_clock on CPU %d went backward by %llu ns\n", raw_smp_processor_id(), ts_delta);

This ns number is not necessarely correct, since a few calls to 
csd_lock_wait_toolong() may have happened before the clock going backwards.

As an example we could have, for a 15ms backwards movement
05ms: csd: [...] waiting 5000 ns for CPU#X
10ms: csd: [...] waiting 10000 ns for CPU#X
12ms: sched_clock on CPU X went backward by 3000 ns
						^ wrong
17ms: csd: [...] waiting 5000 ns for CPU#X
			  ^reset waiting.

> +		*ts1 = ts2;

Second point here, above line just resets the counter, so we start printing 
messages again, with the wrong waiting time. (see 17ms above)

> +		return false;
> +	}
> +


Suggestion A: Save the last ts2 it succeeds, and in case of 
the clock going backwards, adjust ts0 and ts1 to the new clock.
This way the waiting time does not reset, and the backwards amount is 
correct:

########

diff --git a/kernel/smp.c b/kernel/smp.c
index aaffecdad319..e8788f7e1a78 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -213,7 +213,8 @@ static int csd_lock_wait_getcpu(call_single_data_t *csd)
  * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
  * so waiting on other types gets much less information.
  */
-static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id)
+static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 *ts0, u64 *ts1, u64 *ts_save,
+                                 int *bug_id)
 {
        int cpu = -1;
        int cpux;
@@ -233,10 +234,21 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
        }
 
        ts2 = sched_clock();
+
        /* How long since we last checked for a stuck CSD lock.*/
        ts_delta = ts2 - *ts1;
-       if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0))
+       if (likely(ts_delta <= csd_lock_timeout_ns || csd_lock_timeout_ns == 0)) {
+               *ts_save = ts2;
                return false;
+       }
+
+       if (unlikely((s64)ts_delta < 0)) {
+		/* How much clock went backwards */
+               ts_delta = *ts_save - ts2;
+               pr_alert("sched_clock on CPU %d went backward by %llu ns\n",
+                        raw_smp_processor_id(), ts_delta);
+               *ts0 -= ts_delta;
+               *ts1 -= ts_delta;
+       }
+	*ts_save = ts2;
 
        firsttime = !*bug_id;
        if (firsttime)
@@ -248,7 +260,7 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
                cpux = cpu;
        cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */
        /* How long since this CSD lock was stuck. */
-       ts_delta = ts2 - ts0;
+       ts_delta = ts2 - *ts0;
        pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n",
                 firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts_delta,
                 cpu, csd->func, csd->info);
@@ -291,11 +303,11 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
 static void __csd_lock_wait(call_single_data_t *csd)
 {
        int bug_id = 0;
-       u64 ts0, ts1;
+       u64 ts0, ts1, ts_s = 0;
 
        ts1 = ts0 = sched_clock();
        for (;;) {
-               if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id))
+               if (csd_lock_wait_toolong(csd, &ts0, &ts1, &ts_s, &bug_id))
                        break;
                cpu_relax();
        }
##########

Suggestion B: Compare to ts_save to ts2 before calculating ts_delta.
Pros: 	Much better detection of clock goin backwards
Cons: 	Overhead for the fastpath
	More detections -> more messages, which is not necessarily good.

##########
+++ b/kernel/smp.c
@@ -234,6 +234,15 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 *ts0, u64 *ts1, u
        }
 
        ts2 = sched_clock();
+       if (unlikely(*ts_save > ts2)){
+               /* How much clock went backwards */
+               ts_delta = *ts_save - ts2;
+               pr_alert("sched_clock on CPU %d went backward by %llu ns\n",
+                       raw_smp_processor_id(), ts_delta);
+               *ts0 -= ts_delta;
+               *ts1 -= ts_delta;
+       }
+       *ts_save = ts2;
 
        /* How long since we last checked for a stuck CSD lock.*/
        ts_delta = ts2 - *ts1;
##########


What do you think?

Thanks!
Leo