Hi Derek, On 2024-06-18, Derek Barbosa <debarbos@xxxxxxxxxx> wrote: > The realtime team at Red Hat has recently backported the latest printk > changes present in 6.6-rt stable (HEAD at 20fd4439f644 printk: nbcon: > move locked_port flag to struct uart_port) to CentOS Stream 9 for > performance improvements and printk-related bugfixes. > > Since merging this said code, we've hit an interesting bug during > testing, specifically, on larger systems, a softlockup may be reported > by the watchdog when there is a heavy amount of printing to tty > devices (whether it be through printk, /dev/kmsg, etc). > > We have a modicum of reasonable suspicion to believe that > nbcon_reacquire, or some other nbcon mechanism *may* be causing such > behavior. > > Since we've succesfully reproduced this in the Kernel-ARK/Fedora-ELN > (osbuild-rt), and linux-rt-devel 6.10.rc4-rt6, we are reporting this > bug upstream. > > Anyway, here is a more in-depth description, along with some call traces. > > Description: > > On x86 systems with a large amount of logical cores (nproc ~> 60), a > softlockup can be observed with accompanying call trace when a large > amount of "printing" activity is taking place. > > As shown in the call traces appended below, during some kind of numa > balancing/numa_migration after a task_numa_fault --where a set of > processess are being migrated/swapped between two CPUs-- there is a > busy thread that is being waited on (in the order of seconds), causing > a livelock. Additional investigation of collected vmcores by toggling > panic on softlockup shows that the waiting thread may be waiting for a > thread looping with nbcon_reacquire. > > I suspect that some logic within nbcon_context_try_acquire may be a > good place to start. My understanding of the code becomes a bit fuzzy > here, so apologies in advance for any erroneous statements. As I see > it, there may be something happening during migration (or under heavy > load) in which nbcon_reacquire() is in a non-migratable or > non-preemtible state as it is attempting to regain access to a > lost/taken console. It could very well be a situation in which context > was forcefully taken from the printing thread. > > Alternatively, Andrew Halaney <ahalaney@xxxxxxxxxx> suspects that it > is the loop within nbcon_kthread_func() -- since there is nothing that > would yield the task in said loop (cant_migrate()), the migrate code > would be essentially waiting forever for the aforementioned loop to > "finish". I believe in PREEMPT_RT, there would be a preemption point > here. Furthermore, in his investigation, there were signs that the > loop was just iterating up until the crash, leaving reason to believe > that task would be the culprit. > > In fact, with the following diff, we noticed this output: > > ``` > ahalaney@x1gen2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % git diff | cat > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c > index bb9689f94d30..d716b72bf2f8 100644 > --- a/kernel/printk/nbcon.c > +++ b/kernel/printk/nbcon.c > @@ -1075,6 +1075,7 @@ static int nbcon_kthread_func(void *__console) > * > * This pairs with rcuwait_has_sleeper:A and nbcon_kthread_wake:A. > */ > + trace_printk("Before rcuwait_wait_event()\n"); > ret = rcuwait_wait_event(&con->rcuwait, > nbcon_kthread_should_wakeup(con, ctxt), > TASK_INTERRUPTIBLE); /* LMM(nbcon_kthread_func:A) */ > @@ -1086,7 +1087,10 @@ static int nbcon_kthread_func(void *__console) > if (ret) > goto wait_for_event; > > + trace_printk("Starting do while(backlog) loop \n"); > + unsigned long int loop = 0; > do { > + trace_printk("On loop %lu\n", ++loop); > backlog = false; > > cookie = console_srcu_read_lock(); > @@ -1121,6 +1125,7 @@ static int nbcon_kthread_func(void *__console) > console_srcu_read_unlock(cookie); > > } while (backlog); > + trace_printk("End of while(backlog) loop, looped %lu times \n", loop); > > goto wait_for_event; > ``` > > The output: > > ``` > [ 1681.309720] pr/ttyS0-18 21.N... 893365994us : nbcon_kthread_func: On loop 2117 > [ 1681.309727] pr/ttyS0-18 21.N... 893374418us : nbcon_kthread_func: On loop 2118 > [ 1681.309734] pr/ttyS0-18 21.N... 893382860us : nbcon_kthread_func: On loop 2119 > (...) > [ 1681.396193] pr/ttyS0-18 21.N... 954571399us : nbcon_kthread_func: On loop 14025 > [ 1681.396200] pr/ttyS0-18 21.N... 954576525us : nbcon_kthread_func: On loop 14026 > [ 1681.396207] pr/ttyS0-18 21.N... 954581561us : nbcon_kthread_func: On loop 14027 > [ 1681.396213] pr/ttyS0-18 21.N... 954584954us : nbcon_kthread_func: On loop 14028 > [ 1681.396220] pr/ttyS0-18 21.N... 954590111us : nbcon_kthread_func: On loop 14029 > [ 1681.396223] --------------------------------- > [ 1681.396230] Kernel panic - not syncing: softlockup: hung tasks > (...) > ``` > > Demonstrating evidence that the nbcon kthread function is invoked > continuously up until the point of panic. I do believe that this > approach is more sound than my initial hints (or it could be more than > a handful of threads). Some traces around > serial8250_console_write_thread() also denote continuous calls without > any holdups. > > As a sidenote, we are seeing the softlockup with !CONFIG_PREEMPT_RT This trace shows that the thread is successfully printing lines. So I doubt nbcon_reacquire() is involved here. Assuming the ringbuffer is getting filled as fast or faster than the thread can print, then we effectively have this: DEFINE_STATIC_SRCU(test_srcu); static DEFINE_SPINLOCK(test_lock); static int kthread_func(void *arg) { unsigned long flags; do { srcu_read_lock_nmisafe(&test_srcu); spin_lock_irqsave(&test_lock, flags); udelay(5000); // print a line to serial spin_unlock_irqrestore(&test_lock, flags); srcu_read_unlock_nmisafe(&test_srcu); } while (true); return 0; } And since the thread has a nice value of -20, it will get a lot of CPU time allocated to it. Is that a problem? Shouldn't the scheduler eventually kick the task off the CPU after its timeslice is up? > Some questions arise from this, as we've never encountered this in our > testing with John Ogness' console_blast (kudos to <jwyatt@xxxxxxxxxx>) > and other printk torture tests that have been compiled > [here](https://gitlab.com/debarbos/printktorture). Yes, that is odd since those tests will ensure that the printing thread never exits its printing loop because it will never catch up. So it should be the same situation. > We are curious to understand why is it that the printing thread is > chosen by the NUMA balancer for migration/swap, and how that > interaction is handled within the code (in other words, how else would > nbcon handle a migrated printing thread?). The nbcon console can only be owned when migration is disabled. In the case of the printing thread for serial, this is under the spin_lock_irqsave(). The NUMA balancer would only be able to migrate the thread outside of the spin_lock critical section. And outside of the spin_lock critical section, the thread does not own/hold any resources at all. So it should be no problem to migrate it. > Our next round of tests aim to disable numa balancing on > large-multi-core-systems to determine whether it is the NUMA > mechanisms + nbcon interactions are at fault here. I am curious if starting a kthread using the code I wrote above (with nice=-20) would cause similar issues. > How to reproduce: I am currently traveling, so my time/resources are limited. But next week I will be able to look into this in detail. Thanks for this work! John Ogness