Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/18/24 19:10, Ming Lei wrote:
> On Wed, Dec 18, 2024 at 05:03:00PM +0530, Nilay Shroff wrote:
>>
>>
>> On 12/18/24 07:39, Ming Lei wrote:
>>> On Tue, Dec 17, 2024 at 08:07:06AM -0800, Damien Le Moal wrote:
>>>> On 2024/12/16 23:30, Ming Lei wrote:
>>>>> On Tue, Dec 17, 2024 at 08:19:28AM +0100, Christoph Hellwig wrote:
>>>>>> On Tue, Dec 17, 2024 at 03:05:48PM +0800, Ming Lei wrote:
>>>>>>> On Tue, Dec 17, 2024 at 05:40:56AM +0100, Christoph Hellwig wrote:
>>>>>>>> On Tue, Dec 17, 2024 at 09:52:51AM +0800, Ming Lei wrote:
>>>>>>>>> The local copy can be updated in any way with any data, so does another
>>>>>>>>> concurrent update on q->limits really matter?
>>>>>>>>
>>>>>>>> Yes, because that means one of the updates get lost even if it is
>>>>>>>> for entirely separate fields.
>>>>>>>
>>>>>>> Right, but the limits are still valid anytime.
>>>>>>>
>>>>>>> Any suggestion for fixing this deadlock?
>>>>>>
>>>>>> What is "this deadlock"?
>>>>>
>>>>> The commit log provides two reports:
>>>>>
>>>>> - lockdep warning
>>>>>
>>>>> https://lore.kernel.org/linux-block/Z1A8fai9_fQFhs1s@xxxxxxxxxxxxxxxxxxxx/
>>>>>
>>>>> - real deadlock report
>>>>>
>>>>> https://lore.kernel.org/linux-scsi/ZxG38G9BuFdBpBHZ@fedora/
>>>>>
>>>>> It is actually one simple ABBA lock:
>>>>>
>>>>> 1) queue_attr_store()
>>>>>
>>>>>       blk_mq_freeze_queue(q);					//queue freeze lock
>>>>>       res = entry->store(disk, page, length);
>>>>> 	  			queue_limits_start_update		//->limits_lock
>>>>> 				...
>>>>> 				queue_limits_commit_update
>>>>>       blk_mq_unfreeze_queue(q);
>>>>
>>>> The locking + freeze pattern should be:
>>>>
>>>> 	lim = queue_limits_start_update(q);
>>>> 	...
>>>> 	blk_mq_freeze_queue(q);
>>>> 	ret = queue_limits_commit_update(q, &lim);
>>>> 	blk_mq_unfreeze_queue(q);
>>>>
>>>> This pattern is used in most places and anything that does not use it is likely
>>>> susceptible to a similar ABBA deadlock. We should probably look into trying to
>>>> integrate the freeze/unfreeze calls directly into queue_limits_commit_update().
>>>>
>>>> Fixing queue_attr_store() to use this pattern seems simpler than trying to fix
>>>> sd_revalidate_disk().
>>>
>>> This way looks good, just commit af2814149883 ("block: freeze the queue in
>>> queue_attr_store") needs to be reverted, and freeze/unfreeze has to be
>>> added to each queue attribute .store() handler.
>>>
>> Wouldn't it be feasible to add blk-mq freeze in queue_limits_start_update()
>> and blk-mq unfreeze in queue_limits_commit_update()? If we do this then 
>> the pattern would be, 
>>
>> queue_limits_start_update(): limit-lock + freeze
>> queue_limits_commit_update() : unfreeze + limit-unlock  
>>
>> Then in queue_attr_store() we shall just remove freeze/unfreeze.
>>
>> We also need to fix few call sites where we've code block,
>>
>> {
>>     blk_mq_freeze_queue()
>>     ...
>>     queue_limits_start_update()
>>     ...    
>>     queue_limits_commit_update()
>>     ...
>>     blk_mq_unfreeze_queue()
>>     
>> }
>>
>> In the above code block, we may then replace blk_mq_freeze_queue() with
>> queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() 
>> with queue_limits_commit_update().
>>
>> {
>>     queue_limits_start_update()
>>     ...
>>     ...
>>     ...
>>     queue_limits_commit_update()
> 
> In sd_revalidate_disk(), blk-mq request is allocated under queue_limits_start_update(),
> then ABBA deadlock is triggered since blk_queue_enter() implies same lock(freeze lock)
> from blk_mq_freeze_queue().
> 
Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
which allocates the blk-mq request. Can't we fix it? 

One simple way I could think of would be updating queue_limit_xxx() APIs and
then use it,

queue_limits_start_update(struct request_queue *q, bool freeze) 
{
    ...
    mutex_lock(&q->limits_lock);
    if(freeze)
        blk_mq_freeze_queue(q);
    ...
}

queue_limits_commit_update(struct request_queue *q,
		struct queue_limits *lim, bool unfreeze)
{
    ...
    ...
    if(unfreeze)
        blk_mq_unfreeze_queue(q);
    mutex_unlock(&q->limits_lock);
}

sd_revalidate_disk()
{
    ...
    ...
    queue_limits_start_update(q, false);

    ...
    ...

    blk_mq_freeze_queue(q);
    queue_limits_commit_update(q, lim, true);        
}

Thanks,
--Nilay





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux