Search Linux Wireless

Re: 6.7.0-rc1 + hacks deadlock bug, wifi netdev delete + cat of debugfs file.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/8/23 08:07, Johannes Berg wrote:
On Wed, 2023-11-08 at 07:55 -0800, Ben Greear wrote:
On 11/8/23 7:44 AM, Johannes Berg wrote:
On Wed, 2023-11-08 at 07:07 -0800, Ben Greear wrote:
On 11/8/23 2:31 AM, Johannes Berg wrote:
On Tue, 2023-11-07 at 14:08 -0800, Ben Greear wrote:
Hello,

I think this lockup is because iw is holding rtnl and wiphy mutex,
and is blocked waiting for debugfs to be closed.  Another 'cat'
program has debugfs file open, and is blocking on trying to acquire
wiphy mutex.

I think we must not acquire wiphy mutex in debugfs methods, somehow,
to resolve this deadlock.  I do not know a safe way to do that.

Hmm. I almost want to say "don't do that then", but I guess you're just
randomly accessing debugfs files.

I guess we can at least make the mutex acquisition in debugfs killable
(or interruptible), so you can recover from this.

If we can detect that the phy is going away in debugfs, then we could
return early before attempting the lock?  That would catch most things,
I guess,


I don't think it would, it would still get locked on the mutex first.

   but still a potential race since I guess we'd have to do that check
w/out locks.  Can we do a try-mutex-lock, if not acquired, return if wiphy-going-away,
else sleep a bit, try again?

That's kind of awful though? And it's not just the wiphy going away, a
lot of the debugfs files can go away individually (per station, per
link, per key even!).

  From the backtrace in the removal logic, it seems that something waits
for a debugfs file to be closed.

Yes, debugfs remove waits for it to no longer have active users, but
that cannot succeed because the users are blocked on acquiring the
mutex.

Maybe the logic attempting to get the
mutex in debugfs can check if file is waiting to be deleted,
combined with a try-mutex-lock logic, and bail out that way?

I don't know if there's a way to check that, but I'm also not sure how
you'd even implement that?

This code is new to me, but we have a file object in debugfs callbacks,
and this method appears to get dentry from a file object:

static inline struct dentry *file_dentry(const struct file *file)
{
	return d_real(file->f_path.dentry, file_inode(file));
}


In the delete path, we have a dentry object:

void debugfs_remove(struct dentry *dentry)
{
	if (IS_ERR_OR_NULL(dentry))
		return;

	simple_pin_fs(&debug_fs_type, &debugfs_mount, &debugfs_mount_count);
	simple_recursive_removal(dentry, remove_one);
	simple_release_fs(&debugfs_mount, &debugfs_mount_count);
}
EXPORT_SYMBOL_GPL(debugfs_remove);


We could add a wiphy_trylock method, similar to this:

static inline void wiphy_lock(struct wiphy *wiphy)
        __acquires(&wiphy->mtx)
{
        mutex_lock(&wiphy->mtx);
        __acquire(&wiphy->mtx);
}


But using mutex_trylock instead.


This method appears to actually set a S_DEAD flag in the dentry, so maybe
we just check that flag in the mutex_trylock failed to acquire path
in the debugfs read?

void simple_recursive_removal(struct dentry *dentry,
                              void (*callback)(struct dentry *))
{
	struct dentry *this = dget(dentry);
	while (true) {
		struct dentry *victim = NULL, *child;
		struct inode *inode = this->d_inode;

		inode_lock(inode);
		if (d_is_dir(this))
			inode->i_flags |= S_DEAD;


Let me know what you think on that...

Thanks,
Ben


johannes


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux