Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 08 Nov 2016 18:57:20 -0600

James Bottomley <jejb@xxxxxxxxxxxxxxxxxx> writes:

> On Tue, 2016-11-08 at 13:13 -0600, Eric W. Biederman wrote:
>> James Bottomley <jejb@xxxxxxxxxxxxxxxxxx> writes:
>> 
>> > On Tue, 2016-11-08 at 08:52 -0800, Bart Van Assche wrote:
>> > > On 11/08/2016 07:28 AM, James Bottomley wrote:
>> > > > On Mon, 2016-11-07 at 16:32 -0800, Bart Van Assche wrote:
>> > > > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>> > > > > index cf4c636..44ec536 100644
>> > > > > --- a/fs/kernfs/dir.c
>> > > > > +++ b/fs/kernfs/dir.c
>> > > > > @@ -1410,7 +1410,7 @@ int kernfs_remove_by_name_ns(struct
>> > > > > kernfs_node
>> > > > > *parent, const char *name,
>> > > > >  	mutex_lock(&kernfs_mutex);
>> > > > > 
>> > > > >  	kn = kernfs_find_ns(parent, name, ns);
>> > > > > -	if (kn)
>> > > > > +	if (kn && !(kn->flags & KERNFS_SUICIDED))
>> > > > 
>> > > > Actually, wrong flag, you need KERNFS_SUICIDAL.  The reason is
>> > > > that
>> > > > kernfs_mutex is actually dropped half way through
>> > > > __kernfs_remove, 
>> > > > so KERNFS_SUICIDED is not set atomically with this mutex.
>> > > 
>> > > Hello James,
>> > > 
>> > > Sorry but what you wrote is not correct.
>> > 
>> > I think you agree it is dropped.  I don't need to add the bit about 
>> > the reacquisition because the race is mediated by the first 
>> > acquisition not the second one, if you mediate on KERNFS_SUICIDAL, 
>> > you only need to worry about this because the mediation is in the 
>> > first acquisition.   If you mediate on KERNFS_SUICIDED, you need to 
>> > explain that the final thing that means the race can't happen is 
>> > the unbreak in the sysfs delete path re-acquiring s_active ... the 
>> > explanation of what's going on and why gets about 2x more complex.
>> 
>> Is it really the dropping of the lock that is causing this?
>> I don't see that when I read those traces.
>
> No, it's an ABBA lock inversion that causes this.  The traces are
> somewhat dense, but they say it here:
>
>  Possible unsafe locking scenario:
>        CPU0                    CPU1
>        ----                    ----
>   lock(s_active#336);
>                                lock(&shost->scan_mutex);
>                                lock(s_active#336);
>   lock(&shost->scan_mutex);
>
>  *** DEADLOCK ***
>
> The detailed explanation of this is here:
>
> http://marc.info/?l=linux-scsi&m=147855187425596
>
> The fix is ensuring that the CPU1 thread doesn't get into taking
> s_active if CPU0 already has it using the KERNFS_SUICIDED/AL flag as an
> indicator.

So.  The kernfs code does not look safe to have kernfs_remove_self
and kernfs_remove_by_name_ns racing against each other I agree.

The kernfs_remove_self path turns KERNFS_SUICIDAL into another blocking
lock by another name, and without lockdep annotations so I don't know
that it is safe.

The relevant bit from kernfs_remove_self is:

	if (!(kn->flags & KERNFS_SUICIDAL)) {
		kn->flags |= KERNFS_SUICIDAL;
		__kernfs_remove(kn);
		kn->flags |= KERNFS_SUICIDED;
		ret = true;
	} else {
		wait_queue_head_t *waitq = &kernfs_root(kn)->deactivate_waitq;
		DEFINE_WAIT(wait);

		while (true) {
			prepare_to_wait(waitq, &wait, TASK_UNINTERRUPTIBLE);

			if ((kn->flags & KERNFS_SUICIDED) &&
			    atomic_read(&kn->active) == KN_DEACTIVATED_BIAS)
				break;

			mutex_unlock(&kernfs_mutex);
			schedule();
			mutex_lock(&kernfs_mutex);
		}
		finish_wait(waitq, &wait);
		WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb));
		ret = false;
	}

I am pretty certain that if you are going to make kernfs_remove_self and
kernfs_remove_by_name_ns to be safe to race against each other, not just
the KERNFS_SUICIDAL check, but the wait when KERNFS_SUICIDAL is set
needs to be added ot kernfs_remove_by_name_ns.

And I suspect if you add the appropriate lockdep annotations to that mess
you will find yourself in a similar but related ABBA deadlock.

Which is why I would like a simpler easier to understand mechanism if we can.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html