On 2024-11-25 09:59:09 [-0800], Guenter Roeck wrote: > On 11/25/24 09:43, Sebastian Andrzej Siewior wrote: > > On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote: > > > Unfortunately it doesn't make a difference. > > > > stunning. It looks like the exact same error message. > > > > I think it uses > > #define spin_lock_irqsave(lock, flags) \ > do { \ > raw_spin_lock_irqsave(spinlock_check(lock), flags); \ > } while (0) > > from include/linux/spinlock.h, meaning your patch doesn't really make a difference. The difference comes from DEFINE_SPINLOCK vs DEFINE_RAW_SPINLOCK. There is the .lock_type init which goes from LD_WAIT_CONFIG to LD_WAIT_SPIN. And this is all it matters. > > > [ 1.050499] ============================= > > > [ 1.050801] [ BUG: Invalid wait context ] > > > [ 1.051200] 6.12.0+ #1 Not tainted > > > [ 1.051571] ----------------------------- > > > [ 1.051875] swapper/0/1 is trying to lock: > > > [ 1.052201] 0000000001b694c8 (pci_poke_lock){....}-{3:3}, at: pci_config_read16+0x8/0x80 > > > [ 1.052994] other info that might help us debug this: > > > [ 1.053331] context-{5:5} > > > [ 1.053641] 2 locks held by swapper/0/1: > > > [ 1.053959] #0: fffff800042b50f8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0x80/0x160 > > > [ 1.054388] #1: 0000000001d29078 (pci_lock){....}-{2:2}, at: pci_bus_read_config_word+0x18/0x80 > > > [ 1.054793] stack backtrace: > > > [ 1.055171] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0+ #1 > > > [ 1.055632] Call Trace: > > > [ 1.055985] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160 > > > [ 1.056329] [<00000000004e63e8>] lock_acquire+0xe8/0x340 > > > [ 1.056645] [<00000000010f0dfc>] _raw_spin_lock_irqsave+0x3c/0x80 > > > [ 1.056966] [<0000000000443828>] pci_config_read16+0x8/0x80 > > > [ 1.057278] [<000000000044442c>] sun4u_read_pci_cfg+0x12c/0x1a0 > > > [ 1.057593] [<0000000000b7657c>] pci_bus_read_config_word+0x3c/0x80 > > > [ 1.057913] [<0000000000b7fa78>] pci_find_capability+0x18/0xa0 > > > [ 1.058228] [<0000000000b794b0>] set_pcie_port_type+0x10/0x160 > > > [ 1.058543] [<0000000000442a98>] pci_of_scan_bus+0x158/0xb00 > > > [ 1.058854] [<00000000010c74a0>] pci_scan_one_pbm+0xd0/0xf8 > > > [ 1.059167] [<0000000000446174>] sabre_probe+0x1f4/0x5c0 > > > [ 1.059476] [<0000000000c13a48>] platform_probe+0x28/0x80 > > > [ 1.059785] [<0000000000c11158>] really_probe+0xb8/0x340 > > > [ 1.060098] [<0000000000c11584>] driver_probe_device+0x24/0xe0 > > > [ 1.060413] [<0000000000c117ac>] __driver_attach+0x8c/0x160 > > > [ 1.060728] [<0000000000c0ef54>] bus_for_each_dev+0x54/0xc0 > > > > > > The original call trace also included _raw_spin_lock_irqsave(), and > > > I don't have CONFIG_PREEMPT_RT enabled in my sparc64 builds to start with. > > > > You don't have to. "CONFIG_PROVE_RAW_LOCK_NESTING" looks if you try to > > acquire raw_spinlock_t -> spinlock_t. Which it did before I made the > > patch. > > The pci_lock is from drivers/pci/access.c and is defined as > > raw_spinlock_t. And I made pci_poke_lock of the same time. But debug > > says 3:3 which suggests LD_WAIT_CONFIG. (No patch applied). > > > > > FWIW, I don't understand the value of > > > pr_warn("context-{%d:%d}\n", curr_inner, curr_inner); > > > Why print curr_inner twice ? > > > > The syntax was once (or is) inner:outer. If you look from the top, you > > have 4 (mutex_t) followed pci_lock (the raw_spinlock_t) 2. You are at > > level 2 now and try to acquire spin_lock_t (3). > > > > How does that explain the > context-{5:5} This is max value based on context. Your context is a simple process. Not handling an interrupt or anything of this kind. The culprit is | swapper/0/1 is trying to lock: | 0000000001b694c8 (pci_poke_lock){....}-{3:3}, at: pci_config_read16+0x8/0x80 where you have pci_poke_lock classified as a 3. The context allows a 5 so based on the context, the 3 would fly. But since pci_lock is a 2 we have the splat here. > which is created from the following ? > pr_warn("context-{%d:%d}\n", curr_inner, curr_inner); > > Again, why print curr_inner twice ? It is the same syntax as in print_lock_name(). Except here, we don't have an outer type. The difference is RCU because it has a lower type than a spinlock_t and you can acquire a spinlock_t within an RCU section and lockdep is fine with it. It comes yelling once you try this with a mutex_t. > Thanks, > Guenter Sebastian