On Wed 2024-10-16 20:17:48, John Ogness wrote: > Hi Marcus, > > On 2024-10-16, Marcos Paulo de Souza <mpdesouza@xxxxxxxx> wrote: > > Introduce LOUD_CON flag to printk. > > Generally speaking, I do not like the name "LOUD_CON". The flag is > related to records, not consoles. Something like "NO_SUPPRESS" or > "FORCE_PRINT" might be more appropriate. Note that naming is not my > strength. I see. I would prefer to avoid names with double negatives. So I prefer the "FORCE_PRINT" variant. That said, I think that "FORCE_CON" might also be good. Aka, the message forces consoles to show it. > > The new flag will make it possible to > > create a context where printk messages will never be suppressed. This > > new context information will be stored in the already existing > > printk_context per-CPU variable. This variable was changed from 'int' to > > 'unsigned int' to avoid issues with automatic casting. > > > > This mechanism will be used in the next patch to create a loud_console > > context on sysrq handling, removing an existing workaround on the > > loglevel global variable. The workaround existed to make sure that sysrq > > header messages were sent to all consoles. > > IMO the more interesting aspect is that the "loud" flag is stored in the > ringbuffer so that the message is not suppressed, even if printed later > (for example because it was deferred). This actually even fixes a bug > since the current workaround will not perform as expected if the sysrq > records are deferred (for example due to threaded printing or consoles > that are registered later). Yeah. I think that the fix of this race is the main motivation for the change. We should make it more clear in the commit message. > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index beb808f4c367..b893825fe21d 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -1321,6 +1321,7 @@ static void boot_delay_msec(int level) > > unsigned long timeout; > > > > if ((boot_delay == 0 || system_state >= SYSTEM_RUNNING) > > + || is_printk_console_loud() > > || suppress_message_printing(level)) { > > I do not think "loud" should be a reason to skip the delays. The delays > are there to slow down printing. I would think that for "loud" messages, > this is even more important. I suppose this function (as well as > printk_delay()) would need a new boolean parameter whether it is a > "loud" message. Then: > > || (!loud_con && suppress_message_printing(level)) Yes, this makes more sense. > > @@ -2273,6 +2274,9 @@ int vprintk_store(int facility, int level, > > if (dev_info) > > flags |= LOG_NEWLINE; > > > > + if (is_printk_console_loud()) > > + flags |= LOG_LOUD_CON; > > + > > if (flags & LOG_CONT) { > > prb_rec_init_wr(&r, reserve_size); > > if (prb_reserve_in_last(&e, prb, &r, caller_id, PRINTKRB_RECORD_MAX)) { > > I guess LOG_LOUD_CON should also be set in the LOG_CONT case (like > LOG_NEWLINE does). Great catch! Yes, it would make sense to set it even in the re-opened record. > > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > > index 2b35a9d3919d..4618988baeea 100644 > > --- a/kernel/printk/printk_safe.c > > +++ b/kernel/printk/printk_safe.c > > @@ -12,7 +12,30 @@ > > > > #include "internal.h" > > > > -static DEFINE_PER_CPU(int, printk_context); > > +static DEFINE_PER_CPU(unsigned int, printk_context); > > + > > +#define PRINTK_SAFE_CONTEXT_MASK 0x0000ffffU > > +#define PRINTK_LOUD_CONSOLE_CONTEXT_MASK 0xffff0000U > > +#define PRINTK_LOUD_CONSOLE_CONTEXT_OFFSET 0x00010000U > > + > > +void noinstr printk_loud_console_enter(void) > > +{ > > + cant_migrate(); > > + this_cpu_add(printk_context, PRINTK_LOUD_CONSOLE_CONTEXT_OFFSET); > > +} > > Have you tested this with lockdep? AFAICT, the write_sysrq_trigger() > path can migrate since it is only using rcu_read_lock() in > __handle_sysrq(). Interesting. I have always had connected rcu_read_lock() with preempt_disable(). But I see that there actually exists preemptive variant with CONFIG_PREEMPT_RCU. And I guess that preemptive RCU read lock is a must-to-have on RT. And indeed: # echo h >/proc/sysrq-trigger produced: [ 53.669907] BUG: assuming non migratable context at kernel/printk/printk_safe.c:23 [ 53.669920] in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 1637, name: bash [ 53.669931] 2 locks held by bash/1637: [ 53.669936] #0: ffff8ae680a384a8 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x6e/0xf0 [ 53.669968] #1: ffffffff83f226e0 (rcu_read_lock){....}-{1:3}, at: __handle_sysrq+0x3d/0x120 [ 53.670002] CPU: 2 UID: 0 PID: 1637 Comm: bash Not tainted 6.12.0-rc3-default+ #67 [ 53.670011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 [ 53.670020] Call Trace: [ 53.670026] <TASK> [ 53.670045] dump_stack_lvl+0x6c/0xa0 [ 53.670064] __cant_migrate.cold+0x7c/0x89 [ 53.670080] printk_loud_console_enter+0x15/0x30 [ 53.670088] __handle_sysrq+0x60/0x120 [ 53.670104] write_sysrq_trigger+0x6a/0xa0 [ 53.670120] proc_reg_write+0x5f/0xb0 [ 53.670132] vfs_write+0xf9/0x540 [ 53.670147] ? __lock_release.isra.0+0x1a6/0x2c0 [ 53.670172] ? do_user_addr_fault+0x38c/0x720 [ 53.670197] ksys_write+0x6e/0xf0 [ 53.670220] do_syscall_64+0x79/0x190 [ 53.670238] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 53.670252] RIP: 0033:0x7fa4247d9b13 [ 53.670269] Code: 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 f3 c3 0f 1f 00 41 54 55 49 89 d4 53 48 89 [ 53.670275] RSP: 002b:00007ffe62690f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 53.670282] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa4247d9b13 [ 53.670288] RDX: 0000000000000002 RSI: 00005557ac516ec0 RDI: 0000000000000001 [ 53.670291] RBP: 00005557ac516ec0 R08: 000000000000000a R09: 0000000000000000 [ 53.670295] R10: 00007fa4246da468 R11: 0000000000000246 R12: 00007fa424abd500 [ 53.670298] R13: 0000000000000002 R14: 00007fa424ac2c00 R15: 0000000000000002 [ 53.670350] </TASK> [ 53.670358] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z) replay-kernel-logs(R) IMHO, the best solution would be to call migrate_disable()/enable() in printk_loud_console_enter()/exit(). We should not disable preemption because the output could be long. Best Regards, Petr