On Wed, Feb 19, 2020 at 09:15:28AM +0000, Damien Le Moal wrote: > Jean, Guenter, > > I am getting the following lockdep splat on boot since 5.5-rc1 (my > apologies for not signaling this earlier). I do not see this with 5.4 > and as of 5.6-rc2, it still shows up 100% of the time on boot. > I think this is a real problem. I am not sure though what makes it appear now; there have been no interesting changes in the acpi code or in the hwmon driver. There _have_ been some locking related changes in kernfs, though. I don't know if those changes cause the problem, or if they expose it. Either case, the acpi_power_meter driver locks its resources, then removes and re-creates all its attributes (which requires a kernfs lock). At the same time, if an attribute is read, the same lock is acquired by kernfs, then the driver locks its resources. Perfect deadlock. I may be missing something, but I think this entire code is broken. Not only deletes and re-creates it all attributes while the hwmon driver is active, it also releases and re-creates various strings while potentially accessing those very same strings ... without any locking. This is just asking for trouble. We can (try to) work around the lockdep splat, but I don't think that will solve all the problems. Jean, any thoughts ? Guenter > If you are not the right persons to look into this, please let me know. > Thanks ! > > [ 64.145761] ====================================================== > [ 64.152295] WARNING: possible circular locking dependency detected > [ 64.158850] 5.6.0-rc2+ #629 Not tainted > [ 64.158851] ------------------------------------------------------ > [ 64.158854] python/1397 is trying to acquire lock: > [ 64.177039] ffff888619080070 (&resource->lock){+.+.}, at: > show_power+0x3c/0xa0 [acpi_power_meter] > [ 64.177050] > but task is already holding lock: > [ 64.177050] ffff88881643f188 (kn->count#119){++++}, at: > kernfs_seq_start+0x6a/0x160 > [ 64.177058] > which lock already depends on the new lock. > > [ 64.177058] > the existing dependency chain (in reverse order) is: > [ 64.177059] > -> #1 (kn->count#119){++++}: > [ 64.177066] __kernfs_remove+0x626/0x7e0 > [ 64.177068] kernfs_remove_by_name_ns+0x41/0x80 > [ 64.177070] remove_attrs+0xcb/0x3c0 [acpi_power_meter] > [ 64.177073] acpi_power_meter_notify+0x1f7/0x310 > [acpi_power_meter] > [ 64.177078] acpi_ev_notify_dispatch+0x198/0x1f3 > [ 64.177081] acpi_os_execute_deferred+0x4d/0x70 > [ 64.177085] process_one_work+0x7c8/0x1340 > [ 64.177087] worker_thread+0x94/0xc70 > [ 64.177089] kthread+0x2ed/0x3f0 > [ 64.177094] ret_from_fork+0x24/0x30 > [ 64.177094] > -> #0 (&resource->lock){+.+.}: > [ 64.177100] __lock_acquire+0x20be/0x49b0 > [ 64.177101] lock_acquire+0x127/0x340 > [ 64.177104] __mutex_lock+0x15b/0x1350 > [ 64.177106] show_power+0x3c/0xa0 [acpi_power_meter] > [ 64.177110] dev_attr_show+0x3f/0x80 > [ 64.177112] sysfs_kf_seq_show+0x216/0x410 > [ 64.177114] seq_read+0x407/0xf90 > [ 64.177119] vfs_read+0x152/0x2c0 > [ 64.177121] ksys_read+0xf3/0x1d0 > [ 64.177125] do_syscall_64+0x95/0x1010 > [ 64.177127] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 64.177128] > other info that might help us debug this: > > [ 64.177129] Possible unsafe locking scenario: > > [ 64.177130] CPU0 CPU1 > [ 64.177130] ---- ---- > [ 64.177131] lock(kn->count#119); > [ 64.177132] lock(&resource->lock); > [ 64.177134] lock(kn->count#119); > [ 64.177135] lock(&resource->lock); > [ 64.177137] > *** DEADLOCK *** > [ 64.177138] 4 locks held by python/1397: > [ 64.177139] #0: ffff8890242d64e0 (&f->f_pos_lock){+.+.}, at: > __fdget_pos+0x9b/0xb0 > [ 64.177145] #1: ffff889040be74e0 (&p->lock){+.+.}, at: > seq_read+0x6b/0xf90 > [ 64.177148] #2: ffff8890448eb880 (&of->mutex){+.+.}, at: > kernfs_seq_start+0x47/0x160 > [ 64.177151] #3: ffff88881643f188 (kn->count#119){++++}, at: > kernfs_seq_start+0x6a/0x160 > [ 64.177155] > stack backtrace: > [ 64.177158] CPU: 10 PID: 1397 Comm: python Not tainted 5.6.0-rc2+ > #629 > [ 64.177159] Hardware name: Supermicro Super Server/X11DPL-i, BIOS > 3.1 05/21/2019 > [ 64.177160] Call Trace: > [ 64.177167] dump_stack+0x97/0xe0 > [ 64.177170] check_noncircular+0x32e/0x3e0 > [ 64.177172] ? print_circular_bug.isra.0+0x1e0/0x1e0 > [ 64.177177] ? unwind_next_frame+0xb9a/0x1890 > [ 64.177180] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 64.177182] ? graph_lock+0x79/0x170 > [ 64.177183] ? __lockdep_reset_lock+0x3c0/0x3c0 > [ 64.177185] ? mark_lock+0xbc/0x1150 > [ 64.177187] __lock_acquire+0x20be/0x49b0 > [ 64.177190] ? mark_held_locks+0xe0/0xe0 > [ 64.177194] ? stack_trace_save+0x91/0xc0 > [ 64.177196] lock_acquire+0x127/0x340 > [ 64.177199] ? show_power+0x3c/0xa0 [acpi_power_meter] > [ 64.177201] ? device_remove_bin_file+0x10/0x10 > [ 64.177203] ? device_remove_bin_file+0x10/0x10 > [ 64.177205] __mutex_lock+0x15b/0x1350 > [ 64.177208] ? show_power+0x3c/0xa0 [acpi_power_meter] > [ 64.177210] ? show_power+0x3c/0xa0 [acpi_power_meter] > [ 64.177212] ? mutex_lock_io_nested+0x11f0/0x11f0 > [ 64.177214] ? lock_downgrade+0x6a0/0x6a0 > [ 64.177216] ? kernfs_seq_start+0x47/0x160 > [ 64.177218] ? lock_acquire+0x127/0x340 > [ 64.177219] ? kernfs_seq_start+0x6a/0x160 > [ 64.177222] ? device_remove_bin_file+0x10/0x10 > [ 64.177226] ? show_power+0x3c/0xa0 [acpi_power_meter] > [ 64.572676] show_power+0x3c/0xa0 [acpi_power_meter] > [ 64.572681] dev_attr_show+0x3f/0x80 > [ 64.581931] ? memset+0x20/0x40 > [ 64.581935] sysfs_kf_seq_show+0x216/0x410 > [ 64.581938] seq_read+0x407/0xf90 > [ 64.581943] ? security_file_permission+0x16f/0x2c0 > [ 64.581948] vfs_read+0x152/0x2c0 > [ 64.581951] ksys_read+0xf3/0x1d0 > [ 64.581954] ? kernel_write+0x120/0x120 > [ 64.581957] ? filp_open+0x50/0x50 > [ 64.581962] do_syscall_64+0x95/0x1010 > [ 64.581967] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 64.581971] RIP: 0033:0x7fbb4e2c644c > [ 64.581974] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 > d9 49 f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f > 05 <48> 3d 00 f0 ff ff 77 30 44 89 c7 48 89 44 24 08 e8 0f 4a f9 ff 48 > [ 64.581976] RSP: 002b:00007ffe30d33400 EFLAGS: 00000246 ORIG_RAX: > 0000000000000000 > [ 64.581979] RAX: ffffffffffffffda RBX: 00005569d016e1e0 RCX: > 00007fbb4e2c644c > [ 64.581980] RDX: 0000000000001000 RSI: 00005569d0222970 RDI: > 0000000000000003 > [ 64.581982] RBP: 00007fbb4e398300 R08: 0000000000000000 R09: > 00007fbb4e397190 > [ 64.581984] R10: 0000000000000000 R11: 0000000000000246 R12: > 00005569d016e1e0 > [ 64.581985] R13: 00007fbb4e397700 R14: 0000000000000d68 R15: > 0000000000000d68 > > > -- > Damien Le Moal > Western Digital Research