On 2/7/25 5:29 PM, Ming Lei wrote: > On Thu, Feb 06, 2025 at 06:52:36PM +0530, Nilay Shroff wrote: >> >> >> On 2/5/25 9:29 PM, Christoph Hellwig wrote: >>> On Wed, Feb 05, 2025 at 08:14:47PM +0530, Nilay Shroff wrote: >>>> >>>> static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, >>>> @@ -5006,8 +5008,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, >>>> return; >>>> >>>> memflags = memalloc_noio_save(); >>>> - list_for_each_entry(q, &set->tag_list, tag_set_list) >>>> + list_for_each_entry(q, &set->tag_list, tag_set_list) { >>>> + mutex_lock(&q->sysfs_lock); >>> >>> This now means we hold up to number of request queues sysfs_lock >>> at the same time. I doubt lockdep will be happy about this. >>> Did you test this patch with a multi-namespace nvme device or >>> a multi-LU per host SCSI setup? >>> >> Yeah I tested with a multi namespace NVMe disk and lockdep didn't >> complain. Agreed we need to hold up q->sysfs_lock for multiple >> request queues at the same time and that may not be elegant, but >> looking at the mess in __blk_mq_update_nr_hw_queues we may not >> have other choice which could help correct the lock order. > > All q->sysfs_lock instance actually shares same lock class, so this way > should have triggered double lock warning, please see mutex_init(). > Well, my understanding about lockdep is that even though all q->sysfs_lock instances share the same lock class key, lockdep differentiates locks based on their memory address. Since each instance of &q->sysfs_lock has got different memory address, lockdep treat each of them as distinct locks and IMO, that avoids triggering double lock warning. > The ->sysfs_lock involved in this patch looks only for sync elevator > switch with reallocating hctxs, so I am wondering why not add new > dedicated lock for this purpose only? > > Then we needn't to worry about its dependency with q->q_usage_counter(io)? > Yes that should be possible but then as Christoph suggested, __blk_mq_update_ nr_hw_queues already runs holding tag_list_lock and so why shouldn't we use the same tag_list_lock even for sched/elevator updates? That way we may avoid adding another new lock. Thanks, --Nilay