Re: [PATCH 1/2] block: avoid to hold q->limits_lock across APIs for atomic update queue limits

Damien Le Moal <dlemoal@xxxxxxxxxx> · Wed, 18 Dec 2024 06:57:45 -0800

On 2024/12/18 6:05, Nilay Shroff wrote:
>>> In the above code block, we may then replace blk_mq_freeze_queue() with
>>> queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() 
>>> with queue_limits_commit_update().
>>>
>>> {
>>>     queue_limits_start_update()
>>>     ...
>>>     ...
>>>     ...
>>>     queue_limits_commit_update()
>>
>> In sd_revalidate_disk(), blk-mq request is allocated under queue_limits_start_update(),
>> then ABBA deadlock is triggered since blk_queue_enter() implies same lock(freeze lock)
>> from blk_mq_freeze_queue().
>>
> Yeah agreed but I see sd_revalidate_disk() is probably the only exception 
> which allocates the blk-mq request. Can't we fix it? 

If we change where limits_lock is taken now, we will again introduce races
between user config and discovery/revalidation, which is what
queue_limits_start_update() and queue_limits_commit_update() intended to fix in
the first place.

So changing sd_revalidate_disk() is not the right approach.

> One simple way I could think of would be updating queue_limit_xxx() APIs and
> then use it,
> 
> queue_limits_start_update(struct request_queue *q, bool freeze) 
> {
>     ...
>     mutex_lock(&q->limits_lock);
>     if(freeze)
>         blk_mq_freeze_queue(q);
>     ...
> }
> 
> queue_limits_commit_update(struct request_queue *q,
> 		struct queue_limits *lim, bool unfreeze)
> {
>     ...
>     ...
>     if(unfreeze)
>         blk_mq_unfreeze_queue(q);
>     mutex_unlock(&q->limits_lock);
> }
> 
> sd_revalidate_disk()
> {
>     ...
>     ...
>     queue_limits_start_update(q, false);
> 
>     ...
>     ...
> 
>     blk_mq_freeze_queue(q);
>     queue_limits_commit_update(q, lim, true);    

This is overly complicated ... As I suggested, I think that a simpler approach
is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside
queue_limits_commit_update(). Doing so, no driver should need to directly call
freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances
that have the update/freeze order wrong. As mentioned, the pattern simply needs
to be:

	queue_limits_start_update();
	...
	blk_mq_freeze_queue();
	queue_limits_commit_update();
	blk_mq_unfreeze_queue();

I fixed blk-zoned code last week like this. But I had not noticed that other
places had a similar issue as well.

-- 
Damien Le Moal
Western Digital Research