On Thu 2024-08-22 12:32:15, Derek Barbosa wrote: > Hi, > > TLDR: plain, vanilla 6.11.0-0.rc3 is slower on flush and > does not print traces in panic/crash context consistently. > > > The purpose of this email is to share some findings with regards to the latest > available printk changes, in comparison to what is currently available in the > "mainline" upstream torvalds tree. > > Specifically, there was concern regarding flushing, flushing speed, and ensuring > that viable information can be displayed to the user in critical context. This > email also assumes that [0] (and the rest of the thread) has been previously read. > > Moving on, I've been testing the printk code present in the linux-rt-devel tree > for some time, and have been honing in on comparing behaviors/interactions > between a stock, regular kernel and the linux-rt-devel tree. > > The kernels in question are the following: > > 1. a stock torvalds kernel, 6.11.0-0.rc3 > 2. a linux-rt-devel kernel, 6.11.0-0.rc3-rt2, which has the "newer" printk code > > As a note, 6.11.0-0.rc3-rt2 DOES NOT HAVE CONFIG_PREEMPT_RT ENABLED. > > I will refer to these kernels as "new printk" vs "stock printk". > > I've also attached the configs for these kernels. Could you please also share the kernel command line? I can't find it anywhere. Especially I am interested whether it: + wanted to show backtraces on all CPUs via "panic_print" parameter. + did a crashdump or a reboot. + used also another console (graphics). > --- Test 1: John Ogness' Console Blast. --- > > This test uses a script which calls itself to create a pinned process for each CPU. Those > child processes will run in infinite loops of show-task-states via > /proc/sysrq-trigger. This generates lots of contention on the console. After > some time, we use the sysrq-trigger to crash the machine. > > The success condition would be to be able to view the full crash backtrace via > the serial console. > > For each of the kernels, 10 back-to-back trials were performed. > > In the 6.11.0-0.rc3 stock kernel, we did *not* observe a trace on crash. There were various > other traces scattered/nested throughout the show-task-state noise, but no full > crash backtrace. At times, there were upwards of 13k dropped messages. Do you miss the backtrace from the panic-CPU or non-panic-CPUs or both? The dump of the backtraces on non-panic-CPUs might have been affected by the regression fixed earlier this week via https://lore.kernel.org/r/20240812072703.339690-1-takakura@xxxxxxxxxxxxx Did the system reboot in the end? Or does it got stuck somewhere? > In the 6.11.0-0.rc3-rt2 "new printk" kernel, we observed the success condition on each run. At > the "end" of the test (the crash), the full call trace was visible and presented > to us via the serial console. I guess that it is not the problem with the non-panic CPUs because v6.11-rc3-rt2 in rt/linux-rt-devel.git seems to have the same regression. It is great to see that the serial console driver transformed into the new nbcon console is so reliable. Still, it is strange that the stock kernel is so bad in this test. console_flush_on_panic() ignores both console_lock and port->lock. There should be a good chance to see the messages. It might break "only" when the console driver has been stopped on a non-panic CPU in a state which would prevent the panic CPU use the driver even when locks are ignored. Well, the chance of a breakage is likely bigger when the messages are flushed also on the graphics console. Anyway, thanks a lot for the testing and sharing the results. Best Regards, Petr PS: I still have to think about the other results. But they seem to be less surprising. I am most curious about the so bad behavior of the stock kernel in the first test. I hope that we did not break something in the patch handling the legacy consoles.