On Fri, Aug 16, 2019 at 08:14:13AM -0700, Bart Van Assche wrote: > On 8/16/19 6:55 AM, Ming Lei wrote: > > The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store > > path. Meantime, inside block's .show/.store callback, q->sysfs_lock is > > required. > > > > However, when mq & iosched kobjects are removed via > > blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held > > too. This way causes AB-BA lock because the kernfs built-in lock of > > 'kn-count' is required inside kobject_del() too, see the lockdep warning[1]. > > > > On the other hand, it isn't necessary to acquire q->sysfs_lock for > > both blk_mq_unregister_dev() & elv_unregister_queue() because > > clearing REGISTERED flag prevents storing to 'queue/scheduler' > > from being happened. Also sysfs write(store) is exclusive, so no > > necessary to hold the lock for elv_unregister_queue() when it is > > called in switching elevator path. > > > > Fixes the issue by not holding the q->sysfs_lock for blk_mq_unregister_dev() & > > elv_unregister_queue(). > > Have you considered to split sysfs_lock into multiple mutexes? Today it is So far, not consider it. At least now, I just don't see the need to hold sysfs_lock for both blk_mq_unregister_dev() & elv_unregister_queue(). Then we can fix the deadlock issue which can be triggered quite easily, and the fix should be for -stable. Yeah, I agree that sysfs_lock has been used too widely. > very hard to verify the correctness of block layer code that uses sysfs_lock > because it has not been documented anywhere what that mutex protects. I > think that mutex should be split into at least two mutexes: one that > protects switching I/O schedulers and another one that protects hctx->tags > and hctx->sched_tags. sysfs_lock is required in any .show & .store callback, and switching I/O scheduler is done in .store(), then hctx->sched_tags is protected by sysfs_lock too. hctx->tags is tagset wide or host-wide, which is protected by set->tag_list_lock. Thanks, Ming