On Nov 28, 2022, at 8:15 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > !! External Email > > On Tue, 29 Nov 2022 02:36:22 +0000 > Nadav Amit <namit@xxxxxxxxxx> wrote: > >> On Nov 22, 2022, at 12:51 PM, Nadav Amit <namit@xxxxxxxxxx> wrote: >> >>> But more importantly, the current “inline”->”notrace” solution just papers >>> over missing “notrace” annotations. Anyone can remove the “inline” at any >>> given moment since there is no direct (or indirect) relationship between >>> “inline” and “notrace”. It seems to me all random and bound to fail at some >>> point. >> >> Peter, Steven, (and others), >> >> Beyond the issues that are addressed in this patch-set, I encountered one >> more, which reiterates the fact that the heuristics of marking “inline” >> functions as “notrace” is not good enough. >> >> Before I send a patch, I would like to get your feedback. I include a splat >> below. It appeaers the execution might get stuck since some functions that >> can be used for function tracing can be traced themselves. >> >> For example, __kernel_text_address() and unwind_get_return_address() are >> traceable. I think that we need to disallow the tracing of all functions >> that are called directly and indirectly from function_stack_trace_call() >> (i.e., they are in the dynamic extent of function_stack_trace_call). > > How did this happen. It should be able to handle recursion: > > static void > function_stack_trace_call(unsigned long ip, unsigned long parent_ip, > struct ftrace_ops *op, struct ftrace_regs *fregs) > { > struct trace_array *tr = op->private; > struct trace_array_cpu *data; > unsigned long flags; > long disabled; > int cpu; > unsigned int trace_ctx; > > if (unlikely(!tr->function_enabled)) > return; > > /* > * Need to use raw, since this must be called before the > * recursive protection is performed. > */ > local_irq_save(flags); > cpu = raw_smp_processor_id(); > data = per_cpu_ptr(tr->array_buffer.data, cpu); > disabled = atomic_inc_return(&data->disabled); > > if (likely(disabled == 1)) { <<<---- This stops recursion > > trace_ctx = tracing_gen_ctx_flags(flags); > trace_function(tr, ip, parent_ip, trace_ctx); > __trace_stack(tr, trace_ctx, STACK_SKIP); > } > > atomic_dec(&data->disabled); > local_irq_restore(flags); > } > > Each of the stack trace functions may recurse back into this function, but > it will not recurse further. How did it crash? Ugh. Thanks. I didn’t know that - so your input is really helpful. I will need to further debug it, but this issue does not occur every time. The kernel didn’t crash exactly - it’s more of a deadlock. I have lockdep enabled, so it is not a deadlock that lockdep knows. Could it be that somehow things just slowed down due to IPIs and mostly-disabled IRQs? I have no idea. I would need to recreate the scenario. For the record, I tried to saved some details in the previous email. It was kind of hard to understand what’s going on on the other cores, since the trace of other cores was interleaved. I extract the parts from that I think the refer to the another CPU (yes, the output is really slow, as seen in the timestamps): [531413.923628] Code: 00 00 31 c0 eb f1 0f 1f 80 00 00 00 00 e8 1b 2e 16 3e 55 48 89 e5 c6 07 00 0f 1f 00 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 99 da f1 fe 65 8b 05 f2 99 d7 7d 85 c0 74 02 5d All code ======== 0: 00 00 add %al,(%rax) 2: 31 c0 xor %eax,%eax 4: eb f1 jmp 0xfffffffffffffff7 6: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) d: e8 1b 2e 16 3e call 0x3e162e2d 12: 55 push %rbp 13: 48 89 e5 mov %rsp,%rbp 16: c6 07 00 movb $0x0,(%rdi) 19: 0f 1f 00 nopl (%rax) 1c: f7 c6 00 02 00 00 test $0x200,%esi 22: 74 06 je 0x2a 24: fb sti 25: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 2a:* bf 01 00 00 00 mov $0x1,%edi <-- trapping instruction 2f: e8 99 da f1 fe call 0xfffffffffef1dacd 34: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a2d 3b: 85 c0 test %eax,%eax 3d: 74 02 je 0x41 3f: 5d pop %rbp Code starting with the faulting instruction =========================================== 0: bf 01 00 00 00 mov $0x1,%edi 5: e8 99 da f1 fe call 0xfffffffffef1daa3 a: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a03 11: 85 c0 test %eax,%eax 13: 74 02 je 0x17 15: 5d pop %rbp [531414.066765] RSP: 0018:ffffc9000c9a77d8 EFLAGS: 00000206 [531414.077943] RIP: 0010:smp_call_function_many_cond (kernel/smp.c:443 kernel/smp.c:988) [531416.987351] on_each_cpu_cond_mask (kernel/smp.c:1155) [531416.205862] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) [531416.681294] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) [531417.468443] text_poke_bp_batch (arch/x86/kernel/alternative.c:1553) [531418.939923] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:500) [531419.882055] ? ftrace_no_pid_write (kernel/trace/ftrace.c:7864) [531420.510376] ftrace_update_pid_func (kernel/trace/ftrace.c:374 (discriminator 1)) [531420.784703] ftrace_pid_open (kernel/trace/ftrace.c:2918 kernel/trace/ftrace.c:2932 kernel/trace/ftrace.c:7725 kernel/trace/ftrace.c:7835 kernel/trace/ftrace.c:7865) [531421.851294] vfs_open (fs/open.c:1017)