On Wed, Feb 05, 2025 at 12:10:53PM -0800, Paul E. McKenney wrote: > On Wed, Feb 05, 2025 at 09:00:07PM +0106, John Ogness wrote: > > On 2025-02-05, "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote: > > >> This is caused by RCU falling behind a callback-flooding kthread that > > >> invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40 > > >> avoids the splat, but still gets the shutdown-time hang. Retrying with > > >> the default rcutree.kthread_prio=2 failed to reproduce the splat, but > > >> it did reproduce the shutdown-time hang. > > >> > > >> OK, maybe printk buffers are not being flushed? A 100-millisecond sleep > > >> at the end of of rcu_torture_cleanup() got all of rcutorture's output > > >> flushed, but lost the subsequent shutdown-time console traffic. The > > >> pr_flush(HZ/10,1) seems more sensible, but this is private to printk(). > > >> > > >> I would like to log the shutdown-time console traffic because RCU can > > >> sometimes break things on that path. > > > > pr_flush() was changed to private because there were no users. It would > > not be a problem to make it available. Adding a pr_flush() to > > rcu_torture_cleanup() would be an appropriate workaround for now (more > > on this at the end). > > Ah, got it. > > > > There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off() > > > that appears to be intended to dump out the printk() buffers, > > > > It only dumps the buffers to the registered kmsg_dumpers. It is not > > responsible for flushing console backlogs. > > That would explain its not doing much for me. ;-) > > > > but it > > > does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y. > > > Does there need to be a pr_flush() call prior to the call to > > > migrate_to_reboot_cpu()? Or maybe even to do_kernel_power_off_prepare() > > > or kernel_shutdown_prepare()? > > > > With CONFIG_PREEMPT_RT=y, legacy consoles only print via a dedicated > > kthread. Without a pr_flush() somewhere, there is basically no chance > > that they will get backlogs flushed because noone is waitig for them. > > > > The new console API (NBCON) provides support for "atomic consoles", > > which _do_ flush by transitioning to synchronous printing during > > shutdown/reboot. Unfortunately we still don't have any NBCON atomic > > console implemented in the kernel. The 8250 UART will be our first > > driver, most likely available in 6.15. (With the current PREEMPT_RT > > patch applied, the 8250 NBCON atomic driver is used.) > > > > Since only CONFIG_PREEMPT_RT=y has this issue, I am not sure if we want > > to sprinkle pr_flush() calls on all sleepable shutdown/reboot paths, > > although that is certainly one way to handle it. For your case, adding a > > pr_flush() to rcu_torture_cleanup() and making pr_flush() non-private > > would be an easy solution to avoid your problem. > > Would it make sense to put a call to pr_flush() at the beginning of the > kernel_power_off() function? I suspect that would take care of much > more than just rcutorture. > > Or maybe after the pr_emerg("Power down\n"), since that is normally > the last thing that shows up on rcutorture's console logs. > > I will give that a shot and see what happens. ;-) And that does the trick! I specified 500ms, but maybe -1 would be better? You tell me! ;-) Thanx, Paul ------------------------------------------------------------------------ commit 172b692a79ec5801eab57cb55d2f9c398f244c62 Author: Paul E. McKenney <paulmck@xxxxxxxxxx> Date: Wed Feb 5 12:27:23 2025 -0800 printk: Flush console log from kernel_power_off() Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output and shutdown time, which hides shutdown-time RCU issues from rcutorture. Therefore, make pr_flush() public and invoke it after then last print in kernel_power_off(). Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> Cc: Petr Mladek <pmladek@xxxxxxxx> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> Cc: John Ogness <john.ogness@xxxxxxxxxxxxx> Cc: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> diff --git a/include/linux/printk.h b/include/linux/printk.h index 4217a9f412b2..b3e88ff1ecc3 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -806,4 +806,6 @@ static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type, #define print_hex_dump_bytes(prefix_str, prefix_type, buf, len) \ print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true) + +bool pr_flush(int timeout_ms, bool reset_on_progress); #endif diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 80910bc3470c..e0c6e5a99897 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2461,7 +2461,7 @@ asmlinkage __visible int _printk(const char *fmt, ...) } EXPORT_SYMBOL(_printk); -static bool pr_flush(int timeout_ms, bool reset_on_progress); +bool pr_flush(int timeout_ms, bool reset_on_progress); static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress); #else /* CONFIG_PRINTK */ @@ -4466,7 +4466,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre * Context: Process context. May sleep while acquiring console lock. * Return: true if all usable printers are caught up. */ -static bool pr_flush(int timeout_ms, bool reset_on_progress) +bool pr_flush(int timeout_ms, bool reset_on_progress) { return __pr_flush(NULL, timeout_ms, reset_on_progress); } diff --git a/kernel/reboot.c b/kernel/reboot.c index a701000bab34..3448e6ae3556 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -704,6 +704,7 @@ void kernel_power_off(void) migrate_to_reboot_cpu(); syscore_shutdown(); pr_emerg("Power down\n"); + pr_flush(500, 1); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_power_off(); }