[BUG] printk/nbcon can use RCU illegally prior to CPU online

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Recently I was made aware of an issue when running 6.10.0-rc6-rt11+
(with a !PREEMPT_RT config, although I'm not sure it matters here).

Its easy to reproduce, just printk on a CPU that's coming online (I got
pointed to a real splat, but this suffices to reproduce).
For example:

    diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
    index 0c35207320cb..eb75a7cffe31 100644
    --- a/arch/x86/kernel/smpboot.c
    +++ b/arch/x86/kernel/smpboot.c
    @@ -274,6 +274,10 @@ static void notrace start_secondary(void *unused)
            cpuhp_ap_sync_alive();

            cpu_init();
    +
    +       /* Let's printk() and see if RCU whines */
    +       printk(KERN_ERR "RCU, what do you think?!\n");
    +
            fpu__init_cpu();
            rcutree_report_cpu_starting(raw_smp_processor_id());
            x86_cpuinit.early_percpu_clock_init();

Results in the following on boot:

    [    2.376218] .... node  #0, CPUs:   #46
    [    2.377276] .... node  #1, CPUs:   #47
    [    1.890157] RCU, what do you think?!

    [    1.890157] =============================
    [    1.890157] WARNING: suspicious RCU usage
    [    1.890157] 6.10.0-rc6-rt11+ #12 Not tainted
    [    1.890157] -----------------------------
    [    1.890157] kernel/printk/nbcon.c:1157 suspicious rcu_dereference_check() usage!
    [    1.890157] 
                   other info that might help us debug this:

    [    1.890157] 
                   RCU used illegally from offline CPU!
                   rcu_scheduler_active = 1, debug_locks = 1
    [    1.890157] 2 locks held by swapper/1/0:
    [    1.890157]  #0: ffffffff8b87efd0 (console_srcu){....}-{0:0}, at: console_srcu_read_lock+0x30/0x60
    [    1.890157]  #1: ffffffff8bc04050 (rcu_read_lock){....}-{1:3}, at: nbcon_wake_threads+0x4d/0x190
    [    1.890157] 
                   stack backtrace:
    [    1.890157] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.10.0-rc6-rt11+ #12
    [    1.890157] Hardware name: Dell Inc. PowerEdge R660/0NN3RP, BIOS 2.1.5 03/14/2024
    [    1.890157] Call Trace:
    [    1.890157]  <TASK>
    [    1.890157]  dump_stack_lvl+0x86/0xc0
    [    1.890157]  lockdep_rcu_suspicious+0x154/0x1a0
    [    1.890157]  ? nbcon_wake_threads+0x4d/0x190
    [    1.890157]  nbcon_wake_threads+0x176/0x190
    [    1.890157]  vprintk_emit+0x170/0x450
    [    1.890157]  _printk+0x5d/0x80
    [    1.890157]  ? initialize_tlbstate_and_flush+0xb5/0x1d0
    [    1.890157]  start_secondary+0x29/0xb0
    [    1.890157]  common_startup_64+0x13e/0x140
    [    1.890157]  </TASK>

Basically, we're using RCU during a printk() (rcuwait_has_sleeper()
in nbcon_wake_threads()) to figure out if we need to wake a rcuwait'ing thread
with an irq_work + rcuwait_wake_up() dance, and that "figuring out" with RCU is
not allowed prior to a CPU coming online.

I scratched my head and read up on RCU some more, but I don't have any solutions
brainstormed up yet that are proper seeming (can't just blindly use SRCU with rcuwait,
not sure if we should try scheduling an irq_work always until we're "RCU online"
via checking with something akin to rcu_cpu_online(), etc)...

Anyone more familiar with this neck of the woods have a good tip on how
to handle a printk()/nbcon prior to a CPU coming online?

Thanks,
Andrew





[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux