Re: [bug report] lockdep WARN at PCI device rescan

Lukas Wunner <lukas@xxxxxxxxx> · Wed, 29 Nov 2023 12:30:22 +0100

On Tue, Nov 28, 2023 at 10:16:28AM +0000, Shinichiro Kawasaki wrote:
> Here are my three observations:
> 
> A) pci drivers should be able to call p2sb_bar() in probe() without failure.
> B) when /sys/bus/pci/rescan is written, pci_rescan_remove_lock is locked then
>    probe() is called.
> C) p2sb_bar() locks pci_rescan_remove_lock.
> 
> These results in the deadlock. To avoid the deadlock, one of the three needs
> to change. A) is not to change. I guess changing B) will be too much. So, I
> would like to question if we can change C).

It is possible to allow recursive acquisition of a mutex by doing two
things:

* You need to store a pointer to the task_struct which is holding
  the lock.  This allows you to identify upon a recursive acquisition
  that you're already holding the lock.  The acquire operation becomes
  a no-op in this case.

* You need a counter of how many times you've acquired the lock
  recursively.  This allows you to determine upon lock release
  whether the release operation should be a no-op (due to previous
  recursive acquisition) or whether it should result in actual
  lock release (no previous recursive locking, or recursive locking
  has ended).

Actually struct mutex already stores the owner of the lock,
but that's only available internally.

While it would be possible to allow recursive acquisition of
pci_rescan_remove_lock in this way, doing so merely because of
a vendor-specific platform quirk will likely be considered dodgy
by the upstream community.  So Andy's proposal to stash the
struct resource on affected platforms seems more viable from
an upstream acceptability point of view.

Thanks,

Lukas