Hi Steve, in our CI we see the ftrace selftests crashing. It only happens in rare cases - for me i can boot the kernel hundred of times and it crashes only once. The oops output looks like this: [ 3.523720] Running postponed tracer tests: [ 3.524184] Testing tracer function: PASSED [ 3.801523] Testing dynamic ftrace: PASSED [ 4.068009] Testing dynamic ftrace ops #1: [ 4.140142] (1 0 1 0 0) [ 4.140149] (1 1 2 0 0) [ 4.200037] (2 1 3 0 822270) [ 4.200046] (2 2 4 0 822402) [ 4.232728] (3 2 4 0 1147236) [ 4.232740] (3 3 5 0 1147377) [ 4.300835] PASSED [ 4.300840] Testing dynamic ftrace ops #2: [ 4.420423] (1 0 1 582267 0) [ 4.420474] (1 1 2 583259 0) [ 4.420941] (2 1 3 1 5) [ 4.420947] (2 2 4 69 73) [ 4.458483] (3 2 4 494199 491341) [ 4.458616] (3 3 5 497496 494474) [ 4.460091] Unable to handle kernel pointer dereference in virtual kernel address space [ 4.460375] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 [ 4.460458] Fault in home space mode while using kernel ASCE. [ 4.460695] AS:000000008561c007 R3:0000000000000024 [ 4.461143] Oops: 0038 ilc:3 [#1] PREEMPT SMP [ 4.461162] Modules linked in: [ 4.461175] CPU: 245 PID: 0 Comm: swapper/245 Not tainted 5.17.0-rc4-00051-gc5d9ae265b10-dirty #4 [ 4.461183] Hardware name: IBM 8561 T01 701 (KVM/Linux) [ 4.461194] Krnl PSW : 0404e00180000000 00000000835a1fe6 (arch_ftrace_ops_list_func+0x96/0x1b0) [ 4.461212] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 4.461226] Krnl GPRS: 0000000000000000 0000000084bbc110 0000000083496f48 00000000834061f2 [ 4.461237] 000000008a70fa00 0000038000d0fba8 0000000000000001 fffffffffffffeff [ 4.461243] 00000000834061f2 0000038000d0fba8 6b6b6b6b6b6b6b6b 0000000083496f48 [ 4.461250] 0000000000000000 0000000000000000 00000000835a209a 0000038000d0fab8 [ 4.461269] Krnl Code: 00000000835a1fd4: e310a9380224 stg %r1,10552(%r10) 00000000835a1fda: eb0103a8006a asi 936,1 #00000000835a1fe0: c4a800b0d094 lgrl %r10,0000000084bbc108 >00000000835a1fe6: e310a0100004 lg %r1,16(%r10) 00000000835a1fec: a7110020 tmll %r1,32 00000000835a1ff0: a7740012 brc 7,00000000835a2014 00000000835a1ff4: a7114000 tmll %r1,16384 00000000835a1ff8: a7740031 brc 7,00000000835a205a [ 4.461383] Call Trace: [ 4.461388] [<00000000835a1fe6>] arch_ftrace_ops_list_func+0x96/0x1b0 [ 4.461395] ([<00000000835a200e>] arch_ftrace_ops_list_func+0xbe/0x1b0) [ 4.461401] [<00000000841700ec>] ftrace_common+0x1c/0x20 [ 4.461409] [<0000000083496f4e>] preempt_count_sub+0x6/0x138 [ 4.461418] [<00000000834061f2>] read_tod_clock+0x4a/0xb0 [ 4.461431] [<0000000083530818>] ktime_get+0xb8/0x178 [ 4.461441] [<000000008353ec94>] clockevents_program_event+0x6c/0x138 [ 4.461454] [<000000008352d206>] hrtimer_start_range_ns+0xc6/0x100 [ 4.461467] [<0000000083541068>] tick_nohz_idle_stop_tick+0x90/0xf8 [ 4.461479] [<00000000834a9790>] do_idle+0xf0/0x1b0 [ 4.461495] [<00000000834a9ad6>] cpu_startup_entry+0x36/0x40 [ 4.461509] [<000000008416e48e>] restart_int_handler+0x6e/0x90 [ 4.461521] INFO: lockdep is turned off. [ 4.461527] Last Breaking-Event-Address: [ 4.461533] [<00000000835a2026>] arch_ftrace_ops_list_func+0xd6/0x1b0 [ 4.461552] Kernel panic - not syncing: Fatal exception: panic_on_oops The code in question is the ops loop in __ftrace_ops_list_func(): do_for_each_ftrace_op(op, ftrace_ops_list) { /* Stub functions don't need to be called nor tested */ if (op->flags & FTRACE_OPS_FL_STUB) continue; ... The register (r10) where op is stored contains 0x6b6b6b6b6b6b6b6b (POISON_FREE), so it looks like the list contains a member that is already freed. Looking at unregister_ftrace_function(), i noticed that ftrace_shutdown() is called with 0 as command. Given that the ftrace function didn't change and ftrace is still enabled, the rcu_synchronize() functions in ftrace_shutdown() are silently skipped. So the caller frees ops already before other CPUs have gone through quiesce, and may therefore use the old (now freed) list entry. To fix this, i wonder whether we should change the code in unregister_ftrace_function() to: @@ -7827,7 +7837,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops) int ret; mutex_lock(&ftrace_lock); - ret = ftrace_shutdown(ops, 0); + ret = ftrace_shutdown(ops, FTRACE_UPDATE_TRACE_FUNC); mutex_unlock(&ftrace_lock); return ret; I haven't checked whether other callsites of unregister_ftrace_function() also need to be adjusted. What do you think about that 'fix'? Thanks, Sven