Hi All, On 27.04.2022 09:08, Marek Szyprowski wrote: > On 26.04.2022 15:16, Petr Mladek wrote: >> On Tue 2022-04-26 14:07:42, Petr Mladek wrote: >>> On Mon 2022-04-25 23:04:28, John Ogness wrote: >>>> Currently threaded console printers synchronize against each >>>> other using console_lock(). However, different console drivers >>>> are unrelated and do not require any synchronization between >>>> each other. Removing the synchronization between the threaded >>>> console printers will allow each console to print at its own >>>> speed. >>>> >>>> But the threaded consoles printers do still need to synchronize >>>> against console_lock() callers. Introduce a per-console mutex >>>> and a new console boolean field @blocked to provide this >>>> synchronization. >>>> >>>> console_lock() is modified so that it must acquire the mutex >>>> of each console in order to set the @blocked field. Console >>>> printing threads will acquire their mutex while printing a >>>> record. If @blocked was set, the thread will go back to sleep >>>> instead of printing. >>>> >>>> The reason for the @blocked boolean field is so that >>>> console_lock() callers do not need to acquire multiple console >>>> mutexes simultaneously, which would introduce unnecessary >>>> complexity due to nested mutex locking. Also, a new field >>>> was chosen instead of adding a new @flags value so that the >>>> blocked status could be checked without concern of reading >>>> inconsistent values due to @flags updates from other contexts. >>>> >>>> Threaded console printers also need to synchronize against >>>> console_trylock() callers. Since console_trylock() may be >>>> called from any context, the per-console mutex cannot be used >>>> for this synchronization. (mutex_trylock() cannot be called >>>> from atomic contexts.) Introduce a global atomic counter to >>>> identify if any threaded printers are active. The threaded >>>> printers will also check the atomic counter to identify if the >>>> console has been locked by another task via console_trylock(). >>>> >>>> Note that @console_sem is still used to provide synchronization >>>> between console_lock() and console_trylock() callers. >>>> >>>> A locking overview for console_lock(), console_trylock(), and the >>>> threaded printers is as follows (pseudo code): >>>> >>>> console_lock() >>>> { >>>> down(&console_sem); >>>> for_each_console(con) { >>>> mutex_lock(&con->lock); >>>> con->blocked = true; >>>> mutex_unlock(&con->lock); >>>> } >>>> /* console_lock acquired */ >>>> } >>>> >>>> console_trylock() >>>> { >>>> if (down_trylock(&console_sem) == 0) { >>>> if (atomic_cmpxchg(&console_kthreads_active, 0, >>>> -1) == 0) { >>>> /* console_lock acquired */ >>>> } >>>> } >>>> } >>>> >>>> threaded_printer() >>>> { >>>> mutex_lock(&con->lock); >>>> if (!con->blocked) { >>>> /* console_lock() callers blocked */ >>>> >>>> if >>>> (atomic_inc_unless_negative(&console_kthreads_active)) { >>>> /* console_trylock() callers blocked */ >>>> >>>> con->write(); >>>> >>>> atomic_dec(&console_lock_count); >>>> } >>>> } >>>> mutex_unlock(&con->lock); >>>> } >>>> >>>> The console owner and waiter logic now only applies between contexts >>>> that have taken the console_lock via console_trylock(). Threaded >>>> printers never take the console_lock, so they do not have a >>>> console_lock to handover. Tasks that have used console_lock() will >>>> block the threaded printers using a mutex and if the console_lock >>>> is handed over to an atomic context, it would be unable to unblock >>>> the threaded printers. However, the console_trylock() case is >>>> really the only scenario that is interesting for handovers anyway. >>>> >>>> @panic_console_dropped must change to atomic_t since it is no longer >>>> protected exclusively by the console_lock. >>>> >>>> Since threaded printers remain asleep if they see that the console >>>> is locked, they now must be explicitly woken in __console_unlock(). >>>> This means wake_up_klogd() calls following a console_unlock() are >>>> no longer necessary and are removed. >>>> >>>> Also note that threaded printers no longer need to check >>>> @console_suspended. The check for the @blocked field implicitly >>>> covers the suspended console case. >>>> >>>> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx> >>> Nice, it it better than v4. I am going to push this for linux-next. >>> >>> Reviewed-by: Petr Mladek <pmladek@xxxxxxxx> >> JFYI, I have just pushed this patch instead of the one >> from v4 into printk/linux.git, branch rework/kthreads. >> >> It means that this branch has been rebased. It will be >> used in the next refresh of linux-next. > > This patchset landed in linux next-20220426. In my tests I've found > that it causes deadlock on all my Amlogic Meson G12B/SM1 based boards: > Odroid C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system > boots to userspace and getty (with automated login) is executed. I > even see the bash prompt, but then the console is freezed. Reverting > this patch (e00cc0e1cbf4) on top of linux-next (together with > 6b3d71e87892 to make revert clean) fixes the issue. > The Amlogic Meson related issue has been investigated and fixed: https://lore.kernel.org/all/b7c81f02-039e-e877-d7c3-6834728d2117@xxxxxxxxxxx/ but I just found that there is one more issue. It appears on QCom-based DragonBoard 410c SBC (arch/arm64/boot/dts/qcom/apq8016-sbc.dts). To see it on today's linux next-20220506, one has to revert 42cd402b8fd4672b692400fe5f9eecd55d2794ac, otherwise lockdep triggers other warning and it is turned off too early: ================================ WARNING: inconsistent lock state 5.18.0-rc5-next-20220506+ #11869 Not tainted -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff80000aaa8478 (&port_lock_key){?.+.}-{2:2}, at: msm_uart_irq+0x38/0x750 {HARDIRQ-ON-W} state was registered at: lock_acquire.part.0+0xe0/0x230 lock_acquire+0x68/0x84 _raw_spin_lock+0x5c/0x80 __msm_console_write+0x1ac/0x220 msm_console_write+0x48/0x60 __console_emit_next_record+0x188/0x420 printk_kthread_func+0x3a0/0x3bc kthread+0x118/0x11c ret_from_fork+0x10/0x20 irq event stamp: 12182 hardirqs last enabled at (12181): [<ffff800008e3d2a8>] cpuidle_enter_state+0xc4/0x30c stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.0-rc5-next-20220506+ #11869 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) Call trace: dump_backtrace.part.0+0xd0/0xe0 show_stack+0x18/0x6c dump_stack_lvl+0x8c/0xb8 dump_stack+0x18/0x34 print_usage_bug.part.0+0x208/0x22c mark_lock+0x710/0x954 __lock_acquire+0x9fc/0x20cc lock_acquire.part.0+0xe0/0x230 lock_acquire+0x68/0x84 _raw_spin_lock_irqsave+0x80/0xcc msm_uart_irq+0x38/0x750 __handle_irq_event_percpu+0xac/0x3d0 handle_irq_event+0x4c/0x120 handle_fasteoi_irq+0xa4/0x1a0 generic_handle_domain_irq+0x3c/0x60 gic_handle_irq+0x44/0xc4 call_on_irq_stack+0x2c/0x54 do_interrupt_handler+0x80/0x84 el1_interrupt+0x34/0x64 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x64/0x68 cpuidle_enter_state+0xcc/0x30c cpuidle_enter+0x38/0x50 do_idle+0x22c/0x2bc cpu_startup_entry+0x28/0x30 rest_init+0x110/0x190 arch_post_acpi_subsys_init+0x0/0x18 start_kernel+0x6c4/0x704 __primary_switched+0xc0/0xc8 INIT: version 2.88 booting [info] Using makefile-style concurrent boot in runlevel S. Reverting the following patches on top of linux next-20220506 fixes/hides this lockdep warning: 6b3d71e87892 ("printk: remove @console_locked") 8e274732115f ("printk: extend console_lock for per-console locking") 09c5ba0aa2fc ("printk: add kthread console printers") Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland