Re: [PATCH 1/2] block: fix lock ordering between the queue ->sysfs_lock and freeze-lock

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Sat, 8 Feb 2025 18:48:48 +0530



On 2/8/25 2:00 PM, Ming Lei wrote:
> On Fri, Feb 07, 2025 at 11:32:37PM +0530, Nilay Shroff wrote:
>>
>>
>> On 2/7/25 5:29 PM, Ming Lei wrote:
>>> On Thu, Feb 06, 2025 at 06:52:36PM +0530, Nilay Shroff wrote:
>>>>
>>>>
>>>> On 2/5/25 9:29 PM, Christoph Hellwig wrote:
>>>>> On Wed, Feb 05, 2025 at 08:14:47PM +0530, Nilay Shroff wrote:
>>>>>>  
>>>>>>  static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>>>>>> @@ -5006,8 +5008,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>>>>>>  		return;
>>>>>>  
>>>>>>  	memflags = memalloc_noio_save();
>>>>>> -	list_for_each_entry(q, &set->tag_list, tag_set_list)
>>>>>> +	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>>>>>> +		mutex_lock(&q->sysfs_lock);
>>>>>
>>>>> This now means we hold up to number of request queues sysfs_lock
>>>>> at the same time.  I doubt lockdep will be happy about this.
>>>>> Did you test this patch with a multi-namespace nvme device or
>>>>> a multi-LU per host SCSI setup?
>>>>>
>>>> Yeah I tested with a multi namespace NVMe disk and lockdep didn't 
>>>> complain. Agreed we need to hold up q->sysfs_lock for multiple 
>>>> request queues at the same time and that may not be elegant, but 
>>>> looking at the mess in __blk_mq_update_nr_hw_queues we may not
>>>> have other choice which could help correct the lock order.
>>>
>>> All q->sysfs_lock instance actually shares same lock class, so this way
>>> should have triggered double lock warning, please see mutex_init().
>>>
>> Well, my understanding about lockdep is that even though all q->sysfs_lock
>> instances share the same lock class key, lockdep differentiates locks 
>> based on their memory address. Since each instance of &q->sysfs_lock has 
>> got different memory address, lockdep treat each of them as distinct locks 
>> and IMO, that avoids triggering double lock warning.
> 
> That isn't correct, think about how lockdep can deal with millions of
> lock instances.
> 
> Please take a look at the beginning of Documentation/locking/lockdep-design.rst
> 
> ```
> The validator tracks the 'usage state' of lock-classes, and it tracks
> the dependencies between different lock-classes.
> ```
> 
> Please verify it by the following code:
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4e76651e786d..a4ffc6198e7b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -5150,10 +5150,37 @@ void blk_mq_cancel_work_sync(struct request_queue *q)
>  		cancel_delayed_work_sync(&hctx->run_work);
>  }
> 
> +struct lock_test {
> +	struct mutex	lock;
> +};
> +
> +void init_lock_test(struct lock_test *lt)
> +{
> +	mutex_init(&lt->lock);
> +	printk("init lock: %p\n", lt);
> +}
> +
> +static void test_lockdep(void)
> +{
> +	struct lock_test A, B;
> +
> +	init_lock_test(&A);
> +	init_lock_test(&B);
> +
> +	printk("start lock test\n");
> +	mutex_lock(&A.lock);
> +	mutex_lock(&B.lock);
> +	mutex_unlock(&B.lock);
> +	mutex_unlock(&A.lock);
> +	printk("end lock test\n");
> +}
> +
>  static int __init blk_mq_init(void)
>  {
>  	int i;
> 
> +	test_lockdep();
> +
>  	for_each_possible_cpu(i)
>  		init_llist_head(&per_cpu(blk_cpu_done, i));
>  	for_each_possible_cpu(i)
> 
> 
> 
Thank you Ming for providing the patch for testing lockdep!
You and Christoph were correct. The lockdep should complain about possible 
recursive locking for q->sysfs_lock and after a bit of debugging I think I found
the cause about why on my system lockdep was unable to complain about recursive locking. 
The reason is on my test system, I enabled KASAN and KASAN reported a potential 
use-after-free bug that tainted the kernel and disabled the further lock debugging. 
Hence any subsequent locking issues were not detected by lockdep. 

Thanks,
--Nilay