On 8/16/19 6:55 AM, Ming Lei wrote:
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store path. Meantime, inside block's .show/.store callback, q->sysfs_lock is required. However, when mq & iosched kobjects are removed via blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held too. This way causes AB-BA lock because the kernfs built-in lock of 'kn-count' is required inside kobject_del() too, see the lockdep warning[1]. On the other hand, it isn't necessary to acquire q->sysfs_lock for both blk_mq_unregister_dev() & elv_unregister_queue() because clearing REGISTERED flag prevents storing to 'queue/scheduler' from being happened. Also sysfs write(store) is exclusive, so no necessary to hold the lock for elv_unregister_queue() when it is called in switching elevator path. Fixes the issue by not holding the q->sysfs_lock for blk_mq_unregister_dev() & elv_unregister_queue().
Have you considered to split sysfs_lock into multiple mutexes? Today it is very hard to verify the correctness of block layer code that uses sysfs_lock because it has not been documented anywhere what that mutex protects. I think that mutex should be split into at least two mutexes: one that protects switching I/O schedulers and another one that protects hctx->tags and hctx->sched_tags.
Thanks, Bart.