Re: [for-4.16 PATCH v5 0/4] block/dm: allow DM to defer blk_register_queue() until ready

Ming Lei <tom.leiming@xxxxxxxxx> · Tue, 16 Jan 2018 10:21:47 +0800

On Tue, Jan 16, 2018 at 7:13 AM, Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> On Mon, Jan 15 2018 at  6:10P -0500,
> Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
>
>> On Mon, Jan 15 2018 at  5:51pm -0500,
>> Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote:
>>
>> > On Mon, 2018-01-15 at 17:15 -0500, Mike Snitzer wrote:
>> > > sysfs write op calls kernfs_fop_write which takes:
>> > > of->mutex then kn->count#213 (no idea what that is)
>> > > then q->sysfs_lock (via queue_attr_store)
>> > >
>> > > vs
>> > >
>> > > blk_unregister_queue takes:
>> > > q->sysfs_lock then
>> > > kernfs_mutex (via kernfs_remove)
>> > > seems lockdep thinks "kernfs_mutex" is "kn->count#213"?
>> > >
>> > > Feels like lockdep code in fs/kernfs/file.c and fs/kernfs/dir.c is
>> > > triggering false positives.. because these seem like different kernfs
>> > > locks yet they are reported as "kn->count#213".
>> > >
>> > > Certainly feeling out of my depth with kernfs's locking though.
>> >
>> > Hello Mike,
>> >
>> > I don't think that this is a false positive but rather the following traditional
>> > pattern of a potential deadlock involving sysfs attributes:
>> > * One context obtains a mutex from inside a sysfs attribute method:
>> >   queue_attr_store() obtains q->sysfs_lock.
>> > * Another context removes a sysfs attribute while holding a mutex:
>> >   blk_unregister_queue() removes the queue sysfs attributes while holding
>> >   q->sysfs_lock.
>> >
>> > This can result in a real deadlock because the code that removes sysfs objects
>> > waits until all ongoing attribute callbacks have finished.
>> >
>> > Since commit 667257e8b298 ("block: properly protect the 'queue' kobj in
>> > blk_unregister_queue") modified blk_unregister_queue() such that q->sysfs_lock
>> > is held around the kobject_del(&q->kobj) call I think this is a regression
>> > introduced by that commit.
>>
>> Sure, of course it is a regression.
>>
>> Aside from moving the mutex_unlock(&q->sysfs_lock) above the
>> kobject_del(&q->kobj) I don't know how to fix it.
>>
>> Though, realistically that'd be an adequate fix (given the way the code
>> was before).
>
> Any chance you apply this and re-run your srp_test that triggered the
> lockdep splat?
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 4a6a40ffd78e..c50e08e9bf17 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -952,10 +952,10 @@ void blk_unregister_queue(struct gendisk *disk)
>         if (q->request_fn || (q->mq_ops && q->elevator))
>                 elv_unregister_queue(q);
>
> +       mutex_unlock(&q->sysfs_lock);
> +
>         kobject_uevent(&q->kobj, KOBJ_REMOVE);
>         kobject_del(&q->kobj);
>         blk_trace_remove_sysfs(disk_to_dev(disk));
>         kobject_put(&disk_to_dev(disk)->kobj);
> -
> -       mutex_unlock(&q->sysfs_lock);
>  }

If this patch is required, similar change should be needed in failure path
of blk_register_queue() too.

-- 
Ming Lei

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel