[adding the LED folks and the regressions list to the list of recipients] Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Lee, Pavel, could you look into below regression report please? Thread starts here: https://lore.kernel.org/all/9d189ec329cfe68ed68699f314e191a10d4b5eda.camel@xxxxxxxxxxxx/ Another report with somewhat similar symptom can be found here: https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@xxxxxxxxx/ See also Russell's analysis of that report below (many many thx for that, much appreciated Russel!). To my untrained eyes all of this sounds a lot like we still have a 6.9 regression related to the LED code somewhere. Reminder, we had earlier trouble, but that was avoided through other measures: * 3d913719df14c2 ("wifi: iwlwifi: Use request_module_nowait") / https://lore.kernel.org/lkml/30f757e3-73c5-5473-c1f8-328bab98fd7d@xxxxxxxxxxxxxxx/ * c04d1b9ecce565 ("igc: Fix LED-related deadlock on driver unbind") / https://lore.kernel.org/all/ZhRD3cOtz5i-61PB@mail-itl/ * 19fa4f2a85d777 ("r8169: fix LED-related deadlock on module removal") That iwlwifi commit even calls it self "work around". The developer that submitted it bisected the problem to a LED merge, but sadly that was the end of it. :-/ Ciao, Thorsten On 30.05.24 16:04, Russell King (Oracle) wrote: > On Thu, May 30, 2024 at 09:36:45AM -0400, Genes Lists wrote: >> On Thu, 2024-05-30 at 08:53 -0400, Genes Lists wrote: >> This report for 6.9.1 could well be the same issue: >> https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@xxxxxxxxx/ > > The reg_check_chans_work() thing in pid 285 is likely stuck on the > rtnl lock. The same is true of pid 287. > > That will be because of the thread (pid 663) that's stuck in > __dev_open()...led_trigger_register(), where the rtnl lock will have > been taken in that path. It looks to me like led_trigger_register() > is stuck waiting for read access with the leds_list_lock rwsem. > > There are only two places that take that rwsem in write mode, which > are led_classdev_register_ext() and led_classdev_unregister(). None > of these paths are blocking in v6.9. > > Pid 641 doesn't look significant (its probably waiting for either > pid 285 or 287 to complete its work.) > > Pid 666 looks like it is blocked waiting for exclusive write-access > on the leds_list_lock - but it isn't holding that lock. This means > there must already be some other reader or writer holding this lock. > > Pid 722 doesn't look sigificant (same as pid 641). > > Pid 760 is also waiting for the rtnl lock. > > Pid 854, 855 also doesn't look sigificant (as pid 641). > > And then we get to pid 858. This is in set_device_name(), which > was called from led_trigger_set() and led_trigger_register(). > We know from pid 663 that led_trigger_register() can take a read > on leds_list_lock, and indeed it does and then calls > led_match_default_trigger(), which then goes on to call > led_trigger_set(). Bingo, this is why pid 666 is blocked, which > then blocks pid 663. pid 663 takes the rtnl lock, which blocks > everything else _and_ also blocks pid 858 in set_device_name(). > > Lockdep would've found this... this is a classic AB-BA deadlock > between the leds_list_lock rwsem and the rtnl mutex. > > I haven't checked to see how that deadlock got introduced, that's > for someone else to do. P.S.: #regzbot report: / #regzbot introduced: f5c31bcf604d #regzbot duplicate: https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@xxxxxxxxx/ #regzbot summary: leds: Hung tasks due to a AB-BA deadlock between the leds_list_lock rwsem and the rtnl mutex