Search Linux Wireless

Re: Kernel deadlock in 6.7.5 + hacks, maybe debugfs related.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/27/24 08:13, Johannes Berg wrote:
On Tue, 2024-02-27 at 15:58 +0100, Johannes Berg wrote:

Which, btw, ignoring comments, braces, whitespace - then really just
removes the line you're getting stuck on.

So actually no ... invert the test?

  if (refcount_dec_and_test(...))
    return;

If it hit zero here, there's guaranteed to be no user, so we can return.

If it's not zero yet, we might yet go into a new cancellation, so we
need the rest of the function.


This is what I wrote now:


Subject: [PATCH] debugfs: fix wait/cancellation handling during remove

We are no longer able to reproduce a lockup, so please feel free to add:

Tested-by: Madhan Sai <madhan.singaraju@xxxxxxxxxxxxxxx>

Thanks,
Ben


Ben Greear further reports deadlocks during concurrent debugfs
remove while files are being accessed, even though the code in
question now uses debugfs cancellations. Turns out that despite
all the review on the locking, we missed completely that the
logic is wrong: if the refcount hits zero we can finish (and
need not wait for the completion), but if it doesn't we have
to trigger all the cancellations. As written, we can _never_
get into the loop triggering the cancellations. Fix this, and
explain it better while at it.

Cc: stable@xxxxxxxxxxxxxxx
Fixes: 8c88a474357e ("debugfs: add API to allow debugfs operations cancellation")
Reported-by: Ben Greear <greearb@xxxxxxxxxxxxxxx>
Closes: https://lore.kernel.org/r/1c9fa9e5-09f1-0522-fdbc-dbcef4d255ca@xxxxxxxxxxxxxxx
Change-Id: I6c7aeff8c9d6628a8bc1ddcf332205a49d801f17
Signed-off-by: Johannes Berg <johannes.berg@xxxxxxxxx>
---
  fs/debugfs/inode.c | 25 ++++++++++++++++++++-----
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 034a617cb1a5..a40da0065433 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -751,13 +751,28 @@ static void __debugfs_file_removed(struct dentry *dentry)
  	if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
  		return;
- /* if we hit zero, just wait for all to finish */
-	if (!refcount_dec_and_test(&fsd->active_users)) {
-		wait_for_completion(&fsd->active_users_drained);
+	/* if this was the last reference, we're done */
+	if (refcount_dec_and_test(&fsd->active_users))
  		return;
-	}
- /* if we didn't hit zero, try to cancel any we can */
+	/*
+	 * If there's still a reference, the code that obtained it can
+	 * be in different states:
+	 *  - The common case of not using cancellations, or already
+	 *    after debugfs_leave_cancellation(), where we just need
+	 *    to wait for debugfs_file_put() which signals the completion;
+	 *  - inside a cancellation section, i.e. between
+	 *    debugfs_enter_cancellation() and debugfs_leave_cancellation(),
+	 *    in which case we need to trigger the ->cancel() function,
+	 *    and then wait for debugfs_file_put() just like in the
+	 *    previous case;
+	 *  - before debugfs_enter_cancellation() (but obviously after
+	 *    debugfs_file_get()), in which case we may not see the
+	 *    cancellation in the list on the first round of the loop,
+	 *    but debugfs_enter_cancellation() signals the completion
+	 *    after adding it, so this code gets woken up to call the
+	 *    ->cancel() function.
+	 */
  	while (refcount_read(&fsd->active_users)) {
  		struct debugfs_cancellation *c;

Can you test it and let me know if that works?

Same as what we discussed, FWIW, other than the comments.

johannes


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux