Re: 6.7.0-rc1 + hacks deadlock bug, wifi netdev delete + cat of debugfs file.

Ben Greear <greearb@xxxxxxxxxxxxxxx> · Wed, 8 Nov 2023 09:46:12 -0800

On 11/8/23 09:39, Benjamin Berg wrote:
On Wed, 2023-11-08 at 17:07 +0100, Johannes Berg wrote:
  From the backtrace in the removal logic, it seems that something waits
for a debugfs file to be closed.

Yes, debugfs remove waits for it to no longer have active users, but
that cannot succeed because the users are blocked on acquiring the
mutex.

Maybe the logic attempting to get the
mutex in debugfs can check if file is waiting to be deleted,
combined with a try-mutex-lock logic, and bail out that way?

I don't know if there's a way to check that, but I'm also not sure how
you'd even implement that?

Is it likely that we have lock contention for debugfs operations?

If it is relatively unlikely, then maybe just doing a mutex_trylock()
and immediately failing the operation with -EAGAIN could be a solution?
Obviously userspace would need some retry logic, but that is simple and
it could solve the delete problem.

That is pretty nasty to expect each and every user-space app anywhere to suddenly
know that file operations are randomly unreliable...

I think we can do this where we only return no useful file data when we are actually
in teardown phase...

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com