On Wed, 2023-11-08 at 07:55 -0800, Ben Greear wrote: > On 11/8/23 7:44 AM, Johannes Berg wrote: > > On Wed, 2023-11-08 at 07:07 -0800, Ben Greear wrote: > > > On 11/8/23 2:31 AM, Johannes Berg wrote: > > > > On Tue, 2023-11-07 at 14:08 -0800, Ben Greear wrote: > > > > > Hello, > > > > > > > > > > I think this lockup is because iw is holding rtnl and wiphy mutex, > > > > > and is blocked waiting for debugfs to be closed. Another 'cat' > > > > > program has debugfs file open, and is blocking on trying to acquire > > > > > wiphy mutex. > > > > > > > > > > I think we must not acquire wiphy mutex in debugfs methods, somehow, > > > > > to resolve this deadlock. I do not know a safe way to do that. > > > > > > > > Hmm. I almost want to say "don't do that then", but I guess you're just > > > > randomly accessing debugfs files. > > > > > > > > I guess we can at least make the mutex acquisition in debugfs killable > > > > (or interruptible), so you can recover from this. > > > > > > If we can detect that the phy is going away in debugfs, then we could > > > return early before attempting the lock? That would catch most things, > > > I guess, > > > > > > > I don't think it would, it would still get locked on the mutex first. > > > > > but still a potential race since I guess we'd have to do that check > > > w/out locks. Can we do a try-mutex-lock, if not acquired, return if wiphy-going-away, > > > else sleep a bit, try again? > > > > That's kind of awful though? And it's not just the wiphy going away, a > > lot of the debugfs files can go away individually (per station, per > > link, per key even!). > > From the backtrace in the removal logic, it seems that something waits > for a debugfs file to be closed. Yes, debugfs remove waits for it to no longer have active users, but that cannot succeed because the users are blocked on acquiring the mutex. > Maybe the logic attempting to get the > mutex in debugfs can check if file is waiting to be deleted, > combined with a try-mutex-lock logic, and bail out that way? I don't know if there's a way to check that, but I'm also not sure how you'd even implement that? johannes