Re: [PATCH] sparc/pci: Make pci_poke_lock a raw_spinlock_t.

Waiman Long <llong@xxxxxxxxxx> · Mon, 25 Nov 2024 14:33:04 -0500

On 11/25/24 2:23 PM, Guenter Roeck wrote:
On 11/25/24 10:12, Sebastian Andrzej Siewior wrote:
On 2024-11-25 09:59:09 [-0800], Guenter Roeck wrote:
On 11/25/24 09:43, Sebastian Andrzej Siewior wrote:
On 2024-11-25 09:01:33 [-0800], Guenter Roeck wrote:
Unfortunately it doesn't make a difference.

stunning. It looks like the exact same error message.

I think it uses

#define spin_lock_irqsave(lock, flags)                          \
do {                                                            \
         raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
} while (0)

from include/linux/spinlock.h, meaning your patch doesn't really 
make a difference.

The difference comes from DEFINE_SPINLOCK vs DEFINE_RAW_SPINLOCK. There
is the .lock_type init which goes from LD_WAIT_CONFIG to LD_WAIT_SPIN.
And this is all it matters.

Ah, now I get it. Thanks for the explanation. And it turns out my log 
was wrong.
I must have taken it from the old image. Sorry for that.

That specific backtrace isn't seen anymore. But there is another one.

[    1.779653] =============================
[    1.779860] [ BUG: Invalid wait context ]
[    1.780139] 6.12.0+ #1 Not tainted
[    1.780394] -----------------------------
[    1.780600] swapper/0/1 is trying to lock:
[    1.780824] 0000000001b68888 (cpu_map_lock){....}-{3:3}, at: 
map_to_cpu+0x10/0x80
[    1.781393] other info that might help us debug this:
[    1.781624] context-{5:5}
[    1.781838] 3 locks held by swapper/0/1:
[    1.782055]  #0: fffff800042b90f8 (&dev->mutex){....}-{4:4}, at: 
__driver_attach+0x80/0x160
[    1.782345]  #1: fffff800040f2c18 
(&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq+0xa0/0x6e0
[    1.782632]  #2: fffff800040f2ab0 
(&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xc8/0x6e0
[    1.782912] stack backtrace:
[    1.783172] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
6.12.0+ #1
[    1.783498] Call Trace:
[    1.783734] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
[    1.783971] [<00000000004e63e8>] lock_acquire+0xe8/0x340
[    1.784191] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
[    1.784417] [<000000000043ed90>] map_to_cpu+0x10/0x80
[    1.784633] [<000000000042b2b8>] sun4u_irq_enable+0x18/0x80
[    1.784854] [<00000000004fb6b4>] irq_enable+0x34/0xc0
[    1.785069] [<00000000004fb7b8>] __irq_startup+0x78/0xe0
[    1.785287] [<00000000004fb8f0>] irq_startup+0xd0/0x1a0
[    1.785503] [<00000000004f85b4>] __setup_irq+0x5f4/0x6e0
[    1.785726] [<00000000004f8754>] request_threaded_irq+0xb4/0x1a0
[    1.785950] [<0000000000439930>] power_probe+0x70/0xe0
[    1.786165] [<0000000000c13a68>] platform_probe+0x28/0x80
[    1.786382] [<0000000000c11178>] really_probe+0xb8/0x340
[    1.786599] [<0000000000c115a4>] driver_probe_device+0x24/0xe0
[    1.786820] [<0000000000c117cc>] __driver_attach+0x8c/0x160
[    1.787039] [<0000000000c0ef74>] bus_for_each_dev+0x54/0xc0

After replacing cpu_map_lock with a raw spinlock, I get:

[    2.015140] =============================
[    2.015247] [ BUG: Invalid wait context ]
[    2.015419] 6.12.0+ #1 Not tainted
[    2.015564] -----------------------------
[    2.015668] swapper/0/1 is trying to lock:
[    2.015791] fffff80004870610 (&mm->context.lock){....}-{3:3}, at: 
__schedule+0x410/0x5b0
[    2.016306] other info that might help us debug this:
[    2.016451] context-{5:5}
[    2.016539] 3 locks held by swapper/0/1:
[    2.016652]  #0: 0000000001d11f38 (key_types_sem){++++}-{4:4}, at: 
__key_create_or_update+0x5c/0x4c0
[    2.016934]  #1: 0000000001d1b850 
(asymmetric_key_parsers_sem){++++}-{4:4}, at: 
asymmetric_key_preparse+0x18/0xa0
[    2.017197]  #2: fffff8001f811a98 (&rq->__lock){-.-.}-{2:2}, at: 
__schedule+0xdc/0x5b0
[    2.017412] stack backtrace:
[    2.017551] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
6.12.0+ #1
[    2.017800] Call Trace:
[    2.017910] [<00000000004e31d0>] __lock_acquire+0xa50/0x3160
[    2.018062] [<00000000004e63e8>] lock_acquire+0xe8/0x340
[    2.018192] [<00000000010f0dbc>] _raw_spin_lock_irqsave+0x3c/0x80
[    2.018341] [<00000000010e5050>] __schedule+0x410/0x5b0
[    2.018469] [<00000000010e5ae4>] schedule+0x44/0x1c0
[    2.018591] [<00000000010f0684>] schedule_timeout+0xa4/0x100
[    2.018730] [<00000000010e668c>] __wait_for_common+0xac/0x1a0
[    2.018869] [<00000000010e6878>] wait_for_completion_state+0x18/0x40
[    2.019022] [<000000000048ad18>] call_usermodehelper_exec+0x138/0x1c0
[    2.019177] [<000000000052eb40>] __request_module+0x160/0x2e0
[    2.019316] [<00000000009ba6dc>] crypto_alg_mod_lookup+0x17c/0x280
[    2.019466] [<00000000009ba990>] crypto_alloc_tfm_node+0x30/0x100
[    2.019614] [<00000000009dcc5c>] 
public_key_verify_signature+0xbc/0x260
[    2.019772] [<00000000009ded8c>] x509_check_for_self_signed+0xac/0x280
[    2.019928] [<00000000009dddec>] x509_cert_parse+0x14c/0x220
[    2.020065] [<00000000009dea08>] x509_key_preparse+0x8/0x1e0

The problem here is

typedef struct {
        spinlock_t              lock;        <--
        unsigned long           sparc64_ctx_val;
        unsigned long           hugetlb_pte_count;
        unsigned long           thp_pte_count;
        struct tsb_config       tsb_block[MM_NUM_TSBS];
        struct hv_tsb_descr     tsb_descr[MM_NUM_TSBS];
        void                    *vdso;
        bool                    adi;
        tag_storage_desc_t      *tag_store;
        spinlock_t              tag_lock;
} mm_context_t;

Replacing that with a raw spinlock just triggers the next one.

[    2.035384] =============================
[    2.035490] [ BUG: Invalid wait context ]
[    2.035660] 6.12.0+ #3 Not tainted
[    2.035802] -----------------------------
[    2.035906] kworker/u4:3/48 is trying to lock:
[    2.036036] 0000000001b6a790 (ctx_alloc_lock){....}-{3:3}, at: 
get_new_mmu_context+0x14/0x280
[    2.036558] other info that might help us debug this:
[    2.036697] context-{5:5}
[    2.036784] 4 locks held by kworker/u4:3/48:
[    2.036906]  #0: fffff80004838a70 
(&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xc/0x8e0
[    2.037169]  #1: fffff80004838b08 
(&sig->exec_update_lock){+.+.}-{4:4}, at: begin_new_exec+0x344/0xbe0
[    2.037411]  #2: fffff800047fc940 (&p->alloc_lock){+.+.}-{3:3}, at: 
begin_new_exec+0x3a0/0xbe0
[    2.037639]  #3: fffff80004848610 (&mm->context.lock){....}-{2:2}, 
at: begin_new_exec+0x41c/0xbe0

Fixing that finally gives me a clean run. Nevertheless, that makes me 
wonder:
Should I just disable CONFIG_PROVE_RAW_LOCK_NESTING for sparc runtime 
tests ?

If no one is tryng to ever enable PREEMPT_RT on SPARC, I suppose you 
could disable CONFIG_PROVE_RAW_LOCK_NESTING to avoid the trouble.

Cheers,
Longman