Re: a question about how to debug this case in ftrace

Song Chen <chensong_2000@xxxxxx> · Wed, 26 Jun 2024 10:06:33 +0800

Hi Andrew,

If I understood it correctly, it's similar with rcu 
stall(rcu_cpu_stall_ftrace_dump).

So far I don't need too much detail in vmcore, so only merging dmesg and 
ftrace is more practical to my case. I will give it a try, many thanks.

BR

Song

在 2024/6/25 21:20, Andrew Halaney 写道:
Hey Song,

I'll answer that bit as those were my hacks :P

I _thought_ that doing a ftrace_dump_on_oops + softlockup_panic on the
command line (or similar at runtime) would dump the logs, but that
wasn't working for me and I was in a bit of a rush (and already hacking
the kernel up anyways).

So I ended up doing a sequence like:

     tracing_off();
     ftrace_dump(DUMP_ALL);

in the softlockup code when it was detected. Ideally I wanted to look at
the vmcore and look at the ftrace data in there (since debugging printk
by using printk is a little confusing), but there was a makedumpfile bug
I hit... so I went with the hacky route to prove to myself what was
going on. I think since then that's been resolved. Hope that helps!

Thanks,
Andrew

On Tue, Jun 25, 2024 at 09:36:54AM GMT, Song Chen wrote:
Hi Derek,

I am working on a hungtask case, i saw your trace_printk logs go to the same
output with softlockup warning, It's a smart way to easily build a timeline
for analysis.

As far as i know, trace_printk goes to ftrace ring buffer and softlockup
info goes to dmesg. Could you please let me know how you did that, i marked
the part i'm interested in below, thanks a lot.

BR

Song


在 2024/6/18 23:20, Derek Barbosa 写道:
lgoncalv@xxxxxxxxxx, jwyatt@xxxxxxxxxx, aubaker@xxxxxxxxxx
Bcc:
Subject: watchdog BUG: soft lockup - CPU#x stuck for 78s
Reply-To:

Hi,

The realtime team at Red Hat has recently backported the latest printk changes
present in 6.6-rt stable (HEAD at 20fd4439f644 printk: nbcon: move locked_port flag to
struct uart_port) to CentOS Stream 9 for performance improvements and
printk-related bugfixes.

Since merging this said code, we've hit an interesting bug during testing,
specifically, on larger systems, a softlockup may be reported by the watchdog
when there is a heavy amount of printing to tty devices (whether it be through
printk, /dev/kmsg, etc).

We have a modicum of reasonable suspicion to believe that nbcon_reacquire, or
some other nbcon mechanism *may* be causing such behavior.

Since we've succesfully reproduced this in the Kernel-ARK/Fedora-ELN
(osbuild-rt), and linux-rt-devel 6.10.rc4-rt6, we are reporting this bug
upstream.

Anyway, here is a more in-depth description, along with some call traces.

Description:

On x86 systems with a large amount of logical cores (nproc ~> 60), a softlockup can
be observed with accompanying call trace when a large amount of "printing"
activity is taking place.

As shown in the call traces appended below, during some kind of numa
balancing/numa_migration after a task_numa_fault --where a set of processess are being migrated/swapped
between two CPUs-- there is a busy thread that is being waited on (in the order
of seconds), causing a livelock. Additional investigation of collected vmcores
by toggling panic on softlockup shows that the waiting thread may be waiting for
a thread looping with nbcon_reacquire.

I suspect that some logic within nbcon_context_try_acquire may be a good place
to start. My understanding of the code becomes a bit fuzzy here, so apologies
in advance for any erroneous statements. As I see it, there may be something
happening during migration (or under heavy load) in which nbcon_reacquire() is in
a non-migratable or non-preemtible state as it is attempting to regain access to
a lost/taken console. It could very well be a situation in which context
was forcefully taken from the printing thread.

Alternatively, Andrew Halaney <ahalaney@xxxxxxxxxx> suspects that it is the loop within
nbcon_kthread_func() -- since there is nothing that would yield the task in said
loop (cant_migrate()), the migrate code would be essentially waiting forever for
the aforementioned loop to "finish". I believe in PREEMPT_RT, there would be a
preemption point here. Furthermore, in his investigation, there were signs that the
loop was just iterating up until the crash, leaving reason to
believe that task would be the culprit.

In fact, with the following diff, we noticed this output:

```
ahalaney@x1gen2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % git diff | cat

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index bb9689f94d30..d716b72bf2f8 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1075,6 +1075,7 @@ static int nbcon_kthread_func(void *__console)
        *
        * This pairs with rcuwait_has_sleeper:A and nbcon_kthread_wake:A