ftrace startup tests crashing due to missing rcu_synchronize()

Sven Schnelle <svens@xxxxxxxxxxxxx> · Wed, 16 Feb 2022 19:39:03 +0100




Hi Steve,

in our CI we see the ftrace selftests crashing. It only happens
in rare cases - for me i can boot the kernel hundred of times and
it crashes only once. The oops output looks like this:

[    3.523720] Running postponed tracer tests:
[    3.524184] Testing tracer function: PASSED
[    3.801523] Testing dynamic ftrace: PASSED
[    4.068009] Testing dynamic ftrace ops #1:
[    4.140142] (1 0 1 0 0)
[    4.140149] (1 1 2 0 0)
[    4.200037] (2 1 3 0 822270)
[    4.200046] (2 2 4 0 822402)
[    4.232728] (3 2 4 0 1147236)
[    4.232740] (3 3 5 0 1147377)
[    4.300835] PASSED
[    4.300840] Testing dynamic ftrace ops #2:
[    4.420423] (1 0 1 582267 0)
[    4.420474] (1 1 2 583259 0)
[    4.420941] (2 1 3 1 5)
[    4.420947] (2 2 4 69 73)
[    4.458483] (3 2 4 494199 491341)
[    4.458616] (3 3 5 497496 494474)
[    4.460091] Unable to handle kernel pointer dereference in virtual kernel address space
[    4.460375] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
[    4.460458] Fault in home space mode while using kernel ASCE.
[    4.460695] AS:000000008561c007 R3:0000000000000024
[    4.461143] Oops: 0038 ilc:3 [#1] PREEMPT SMP
[    4.461162] Modules linked in:
[    4.461175] CPU: 245 PID: 0 Comm: swapper/245 Not tainted 5.17.0-rc4-00051-gc5d9ae265b10-dirty #4
[    4.461183] Hardware name: IBM 8561 T01 701 (KVM/Linux)
[    4.461194] Krnl PSW : 0404e00180000000 00000000835a1fe6 (arch_ftrace_ops_list_func+0x96/0x1b0)
[    4.461212]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[    4.461226] Krnl GPRS: 0000000000000000 0000000084bbc110 0000000083496f48 00000000834061f2
[    4.461237]            000000008a70fa00 0000038000d0fba8 0000000000000001 fffffffffffffeff
[    4.461243]            00000000834061f2 0000038000d0fba8 6b6b6b6b6b6b6b6b 0000000083496f48
[    4.461250]            0000000000000000 0000000000000000 00000000835a209a 0000038000d0fab8
[    4.461269] Krnl Code: 00000000835a1fd4: e310a9380224        stg     %r1,10552(%r10)
                          00000000835a1fda: eb0103a8006a        asi     936,1
                         #00000000835a1fe0: c4a800b0d094        lgrl    %r10,0000000084bbc108
                         >00000000835a1fe6: e310a0100004        lg      %r1,16(%r10)
                          00000000835a1fec: a7110020            tmll    %r1,32
                          00000000835a1ff0: a7740012            brc     7,00000000835a2014
                          00000000835a1ff4: a7114000            tmll    %r1,16384
                          00000000835a1ff8: a7740031            brc     7,00000000835a205a
[    4.461383] Call Trace:
[    4.461388]  [<00000000835a1fe6>] arch_ftrace_ops_list_func+0x96/0x1b0
[    4.461395] ([<00000000835a200e>] arch_ftrace_ops_list_func+0xbe/0x1b0)
[    4.461401]  [<00000000841700ec>] ftrace_common+0x1c/0x20
[    4.461409]  [<0000000083496f4e>] preempt_count_sub+0x6/0x138
[    4.461418]  [<00000000834061f2>] read_tod_clock+0x4a/0xb0
[    4.461431]  [<0000000083530818>] ktime_get+0xb8/0x178
[    4.461441]  [<000000008353ec94>] clockevents_program_event+0x6c/0x138
[    4.461454]  [<000000008352d206>] hrtimer_start_range_ns+0xc6/0x100
[    4.461467]  [<0000000083541068>] tick_nohz_idle_stop_tick+0x90/0xf8
[    4.461479]  [<00000000834a9790>] do_idle+0xf0/0x1b0
[    4.461495]  [<00000000834a9ad6>] cpu_startup_entry+0x36/0x40
[    4.461509]  [<000000008416e48e>] restart_int_handler+0x6e/0x90
[    4.461521] INFO: lockdep is turned off.
[    4.461527] Last Breaking-Event-Address:
[    4.461533]  [<00000000835a2026>] arch_ftrace_ops_list_func+0xd6/0x1b0
[    4.461552] Kernel panic - not syncing: Fatal exception: panic_on_oops

The code in question is the ops loop in __ftrace_ops_list_func():

do_for_each_ftrace_op(op, ftrace_ops_list) {
/* Stub functions don't need to be called nor tested */
	if (op->flags & FTRACE_OPS_FL_STUB)
		continue;
...

The register (r10) where op is stored contains 0x6b6b6b6b6b6b6b6b
(POISON_FREE), so it looks like the list contains a member that is
already freed.

Looking at unregister_ftrace_function(), i noticed that
ftrace_shutdown() is called with 0 as command. Given that the ftrace
function didn't change and ftrace is still enabled, the
rcu_synchronize() functions in ftrace_shutdown() are silently skipped.
So the caller frees ops already before other CPUs have gone through
quiesce, and may therefore use the old (now freed) list entry.

To fix this, i wonder whether we should change the code in
unregister_ftrace_function() to:

@@ -7827,7 +7837,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
        int ret;
 
        mutex_lock(&ftrace_lock);
-       ret = ftrace_shutdown(ops, 0);
+       ret = ftrace_shutdown(ops, FTRACE_UPDATE_TRACE_FUNC);
        mutex_unlock(&ftrace_lock);
 
        return ret;

I haven't checked whether other callsites of unregister_ftrace_function()
also need to be adjusted. What do you think about that 'fix'?

Thanks,
Sven