On Tue, 12 Jul 2022 14:29:14 -0400 Paul Gortmaker wrote: > On 10/07/2022 (Sun 09:01) Hillf Danton wrote: > > > Looks like a ABBA deadlock that should be fed to lockdep, given what was > > blocking flusher (pid 213). > > [,,,] > > > Make lock_buffer have mutex_lock() semantics in attempt to catch the > > deadlock above. > > > > Only for thoughts now. > > Thanks for the test patch - I'm keeping it as a reference for future > use. > > So, preempt-rt issues an early lockdep splat at partition probe bootup, > both with and without the original v3 patch from this thread. > > Of course then I figured I'd try the debug patch on vanilla v5.19-rc3 > and we get pretty much the same lockdep complaint. > > > sd 1:0:0:0: Attached scsi generic sg1 type 0 > sd 1:0:0:0: [sdb] Preferred minimum I/O size 512 bytes > scsi 2:0:0:0: CD-ROM HL-DT-ST DVD-ROM DH30N A101 PQ: 0 ANSI: 5 > > ===================================== > WARNING: bad unlock balance detected! > 5.19.0-rc3-dirty #2 Not tainted > ------------------------------------- > swapper/2/0 is trying to release lock (buffer_head_lock) at: > [<ffffffffbc494a6b>] end_buffer_async_read+0x5b/0x180 > but there are no more locks to release! The lock releaser different from the lock acquirer was caught. > > other info that might help us debug this: > 1 lock held by swapper/2/0: > #0: ffff8bee27744080 (&ret->b_uptodate_lock){....}-{2:2}, at: end_buffer_async_read+0x47/0x180 > > stack backtrace: > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.19.0-rc3-dirty #2 > Hardware name: Dell Inc. Precision WorkStation T5500 /0D883F, BIOS A16 05/28/2013 > Call Trace: > <IRQ> > dump_stack_lvl+0x40/0x5c > lock_release+0x245/0x3f0 > unlock_buffer+0x15/0x30 > end_buffer_async_read+0x5b/0x180 > end_bio_bh_io_sync+0x1e/0x40 > blk_update_request+0x9a/0x470 > scsi_end_request+0x27/0x190 > scsi_io_completion+0x3c/0x580 > blk_complete_reqs+0x39/0x50 > __do_softirq+0x11d/0x344 > irq_exit_rcu+0xa9/0xc0 > common_interrupt+0xa5/0xc0 > </IRQ> > <TASK> > asm_common_interrupt+0x27/0x40 > RIP: 0010:cpuidle_enter_state+0x12d/0x3f0 > Code: 49 43 0f 84 b7 02 00 00 31 ff e8 fe 1c 74 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 8e 02 00 00 31 ff e8 97 95 7a ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 > RSP: 0000:ffffa2890013fe90 EFLAGS: 00000206 > RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000002 > RDX: 0000000080000002 RSI: ffffffffbd78d911 RDI: 00000000ffffffff > RBP: ffff8bee2739a400 R08: 0000000000000000 R09: 00000000001faf40 > R10: ffffa2890013fdc8 R11: 0000000000000000 R12: 00000000a7b149d6 > R13: ffffffffbdbee3a0 R14: 0000000000000003 R15: 0000000000000001 > cpuidle_enter+0x24/0x40 > do_idle+0x1e3/0x230 > cpu_startup_entry+0x14/0x20 > start_secondary+0xe8/0xf0 > secondary_startup_64_no_verify+0xe0/0xeb > </TASK> > sda: sda1 sda2 sda3 > sd 0:0:0:0: [sda] Attached SCSI disk > sdb: sdb1 sdb2 > sd 1:0:0:0: [sdb] Attached SCSI disk > > Not quite sure what to make of that. You provide the answer to the question why it has been quite a long while that lock_buffer has no annotation to help lockdep. And perhaps lock_page as well despite the relevant long running deadlock does not sit deep under the water. Thanks Hillf