Re: ftrace startup tests crashing due to missing rcu_synchronize()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Steven Rostedt <rostedt@xxxxxxxxxxx> writes:

> On Wed, 16 Feb 2022 19:39:03 +0100
> Sven Schnelle <svens@xxxxxxxxxxxxx> wrote:
>> [    4.460091] Unable to handle kernel pointer dereference in virtual kernel address space
>> [    4.460375] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
>> [    4.460458] Fault in home space mode while using kernel ASCE.
>> [    4.460695] AS:000000008561c007 R3:0000000000000024
>> [    4.461143] Oops: 0038 ilc:3 [#1] PREEMPT SMP
>> [    4.461162] Modules linked in:
>> [    4.461175] CPU: 245 PID: 0 Comm: swapper/245 Not tainted 5.17.0-rc4-00051-gc5d9ae265b10-dirty #4
>> [    4.461183] Hardware name: IBM 8561 T01 701 (KVM/Linux)
>
> I this a 390?

Yes.

>> Looking at unregister_ftrace_function(), i noticed that
>> ftrace_shutdown() is called with 0 as command. Given that the ftrace
>> function didn't change and ftrace is still enabled, the
>> rcu_synchronize() functions in ftrace_shutdown() are silently skipped.
>> So the caller frees ops already before other CPUs have gone through
>> quiesce, and may therefore use the old (now freed) list entry.
>> 
>> To fix this, i wonder whether we should change the code in
>> unregister_ftrace_function() to:
>> 
>> @@ -7827,7 +7837,7 @@ int unregister_ftrace_function(struct ftrace_ops *ops)
>>         int ret;
>>  
>>         mutex_lock(&ftrace_lock);
>> -       ret = ftrace_shutdown(ops, 0);
>> +       ret = ftrace_shutdown(ops, FTRACE_UPDATE_TRACE_FUNC);
>
> No, the ftrace_shutdown() will add that flag if it is needed.
>
>>         mutex_unlock(&ftrace_lock);
>>  
>>         return ret;
>> 
>> I haven't checked whether other callsites of unregister_ftrace_function()
>> also need to be adjusted. What do you think about that 'fix'?
>
> But what I'm thinking is, the function is being freed but has yet to be
> removed from the list. Or that a synchronization is missed.
>
> That is, shutdown is called, the item is removed from the list and freed,
> but something got preempted while on the ftrace trampoline, with a
> reference to the item, and then woke up and executed the item that was
> freed.
>
> I'll look into it. Thanks for the report.

With additional debugging i see:

@@ -2967,14 +2974,17 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
        }
 
        if (!command || !ftrace_enabled) {
+               pr_err("%s: skipping rcu_synchronize(): ops=%pS command=%d ftrace_enabled=%d saved func=%pS ftrace_trace_func=%pS\n",
+                      __func__, ops, command, ftrace_enabled, saved_ftrace_func, ftrace_trace_function);
                /*
                 * If these are dynamic or per_cpu ops, they still
                 * need their data freed. Since, function tracing is
                 * not currently active, we can just free them
                 * without synchronizing all CPUs.
                 */

[  +0.000011] unregister_ftrace_function: 0x2f6792e00
[  +0.000023] removing ops 00000002f6792e00 trace_selftest_test_dyn_func+0x0/0x18
[  +0.000661] ftrace_shutdown: skipping rcu_synchronize(): ops=0x2f6792e00 command=0 ftrace_enabled=1 saved func=arch_ftrace_ops_list_func+0x0/0x1b0 ft>
[  +0.010032] unregister_ftrace_function: test_probe1+0x0/0x1b0
[  +0.000017] removing ops 000000009d876e40 trace_selftest_test_probe1_func+0x0/0x18
[  +0.000053] ftrace_shutdown: skipping rcu_synchronize(): ops=test_probe1+0x0/0x1b0 command=0 ftrace_enabled=1 saved func=arch_ftrace_ops_list_func+0x>
[  +0.000017] unregister_ftrace_function: test_probe2+0x0/0x1b0
[  +0.000081] removing ops 000000009d876ff0 trace_selftest_test_probe2_func+0x0/0x18
[  +0.000064] ftrace_shutdown: skipping rcu_synchronize(): ops=test_probe2+0x0/0x1b0 command=0 ftrace_enabled=1 saved func=arch_ftrace_ops_list_func+0x>
[  +0.000015] unregister_ftrace_function: test_probe3+0x0/0x1b0
[  +0.000011] removing ops 000000009d8771a0 trace_selftest_test_probe3_func+0x0/0x18
[  +0.000108] unregister_ftrace_function: global_ops+0x0/0x1b0
[  +0.000010] removing ops 000000009d9325b8 trace_selftest_test_global_func+0x0/0x18
[  +0.025759] PASSED

So the rcu_synchronize is definitely skipped. Another thing i was
wondering was whether we need to reset the next pointer in
the to-be-removed entry in remove_ftrace_ops(). But i haven't
investigated that in detail yet.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux