Re: Kernel deadlock in 6.7.5 + hacks, maybe debugfs related.

Johannes Berg <johannes@xxxxxxxxxxxxxxxx> · Tue, 27 Feb 2024 14:47:57 +0100

> Feb 26 06:01:45 ct523c-0b0b kernel: task:ip              state:D stack:0     pid:28125 tgid:28125 ppid:3604   flags:0x00004002
> Feb 26 06:01:45 ct523c-0b0b kernel: Call Trace:
> Feb 26 06:01:45 ct523c-0b0b kernel:  <TASK>
> Feb 26 06:01:45 ct523c-0b0b kernel:  __schedule+0x42c/0xde0
> Feb 26 06:01:45 ct523c-0b0b kernel:  schedule+0x3c/0x120
> Feb 26 06:01:45 ct523c-0b0b kernel:  schedule_timeout+0x19c/0x1b0
> Feb 26 06:01:45 ct523c-0b0b kernel:  ? mark_held_locks+0x49/0x70
> Feb 26 06:01:45 ct523c-0b0b kernel:  __wait_for_common+0xba/0x1d0
> Feb 26 06:01:45 ct523c-0b0b kernel:  ? usleep_range_state+0xb0/0xb0
> Feb 26 06:01:45 ct523c-0b0b kernel:  remove_one+0x6b/0x100

Can you say where this remove_one+0x6b is?

I feel it's probably this:

       if (!refcount_dec_and_test(&fsd->active_users)) {
               wait_for_completion(&fsd->active_users_drained);

which ... looking at it, seems wrong?

_Completely_ untested:

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 034a617cb1a5..fb636478c54d 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -751,13 +751,19 @@ static void __debugfs_file_removed(struct dentry *dentry)
 	if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
 		return;
 
-	/* if we hit zero, just wait for all to finish */
-	if (!refcount_dec_and_test(&fsd->active_users)) {
-		wait_for_completion(&fsd->active_users_drained);
-		return;
-	}
+	/*
+	 * Now that debugfs_file_get() no longer sees a valid entry,
+	 * decrement the refcount to remove the initial reference.
+	 */
+	refcount_dec(&fsd->active_users);
 
-	/* if we didn't hit zero, try to cancel any we can */
+	/*
+	 * As long as it's not zero, try to cancel any cancellations,
+	 * new incoming ones will wake up the completion as we might
+	 * have raced: debugfs_file_get() had already been done, but
+	 * debugfs_enter_cancellation() hadn't, by the time we got
+	 * to this point here.
+	 */
 	while (refcount_read(&fsd->active_users)) {
 		struct debugfs_cancellation *c;
 



johannes