Hung tasks due to a AB-BA deadlock between the leds_list_lock rwsem and the rtnl mutex (was: 6.9.3 Hung tasks)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[adding the LED folks and the regressions list to the list of recipients]

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Lee, Pavel, could you look into below regression report please? Thread
starts here:
https://lore.kernel.org/all/9d189ec329cfe68ed68699f314e191a10d4b5eda.camel@xxxxxxxxxxxx/

Another report with somewhat similar symptom can be found here:
https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@xxxxxxxxx/

See also Russell's analysis of that report below (many many thx for
that, much appreciated Russel!).

To my untrained eyes all of this sounds a lot like we still have a 6.9
regression related to the LED code somewhere. Reminder, we had earlier
trouble, but that was avoided through other measures:

* 3d913719df14c2 ("wifi: iwlwifi: Use request_module_nowait") /
https://lore.kernel.org/lkml/30f757e3-73c5-5473-c1f8-328bab98fd7d@xxxxxxxxxxxxxxx/

* c04d1b9ecce565 ("igc: Fix LED-related deadlock on driver unbind") /
https://lore.kernel.org/all/ZhRD3cOtz5i-61PB@mail-itl/

* 19fa4f2a85d777 ("r8169: fix LED-related deadlock on module removal")

That iwlwifi commit even calls it self "work around". The developer that
submitted it bisected the problem to a LED merge, but sadly that was the
end of it. :-/

Ciao, Thorsten

On 30.05.24 16:04, Russell King (Oracle) wrote:
> On Thu, May 30, 2024 at 09:36:45AM -0400, Genes Lists wrote:
>> On Thu, 2024-05-30 at 08:53 -0400, Genes Lists wrote:
>> This report for 6.9.1 could well be the same issue:
>> https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@xxxxxxxxx/
> 
> The reg_check_chans_work() thing in pid 285 is likely stuck on the
> rtnl lock. The same is true of pid 287.
> 
> That will be because of the thread (pid 663) that's stuck in
> __dev_open()...led_trigger_register(), where the rtnl lock will have
> been taken in that path. It looks to me like led_trigger_register()
> is stuck waiting for read access with the leds_list_lock rwsem.
> 
> There are only two places that take that rwsem in write mode, which
> are led_classdev_register_ext() and led_classdev_unregister(). None
> of these paths are blocking in v6.9.
> 
> Pid 641 doesn't look significant (its probably waiting for either
> pid 285 or 287 to complete its work.)
> 
> Pid 666 looks like it is blocked waiting for exclusive write-access
> on the leds_list_lock - but it isn't holding that lock. This means
> there must already be some other reader or writer holding this lock.
> 
> Pid 722 doesn't look sigificant (same as pid 641).
> 
> Pid 760 is also waiting for the rtnl lock.
> 
> Pid 854, 855 also doesn't look sigificant (as pid 641).
> 
> And then we get to pid 858. This is in set_device_name(), which
> was called from led_trigger_set() and led_trigger_register().
> We know from pid 663 that led_trigger_register() can take a read
> on leds_list_lock, and indeed it does and then calls
> led_match_default_trigger(), which then goes on to call
> led_trigger_set(). Bingo, this is why pid 666 is blocked, which
> then blocks pid 663. pid 663 takes the rtnl lock, which blocks
> everything else _and_ also blocks pid 858 in set_device_name().
> 
> Lockdep would've found this... this is a classic AB-BA deadlock
> between the leds_list_lock rwsem and the rtnl mutex.
> 
> I haven't checked to see how that deadlock got introduced, that's
> for someone else to do.

P.S.:

#regzbot report: /
#regzbot introduced: f5c31bcf604d
#regzbot duplicate:
https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@xxxxxxxxx/
#regzbot summary: leds: Hung tasks due to a AB-BA deadlock between the
leds_list_lock rwsem and the rtnl mutex




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux