On 2024/12/18 6:05, Nilay Shroff wrote: >>> In the above code block, we may then replace blk_mq_freeze_queue() with >>> queue_limits_commit_update() and similarly replace blk_mq_unfreeze_queue() >>> with queue_limits_commit_update(). >>> >>> { >>> queue_limits_start_update() >>> ... >>> ... >>> ... >>> queue_limits_commit_update() >> >> In sd_revalidate_disk(), blk-mq request is allocated under queue_limits_start_update(), >> then ABBA deadlock is triggered since blk_queue_enter() implies same lock(freeze lock) >> from blk_mq_freeze_queue(). >> > Yeah agreed but I see sd_revalidate_disk() is probably the only exception > which allocates the blk-mq request. Can't we fix it? If we change where limits_lock is taken now, we will again introduce races between user config and discovery/revalidation, which is what queue_limits_start_update() and queue_limits_commit_update() intended to fix in the first place. So changing sd_revalidate_disk() is not the right approach. > One simple way I could think of would be updating queue_limit_xxx() APIs and > then use it, > > queue_limits_start_update(struct request_queue *q, bool freeze) > { > ... > mutex_lock(&q->limits_lock); > if(freeze) > blk_mq_freeze_queue(q); > ... > } > > queue_limits_commit_update(struct request_queue *q, > struct queue_limits *lim, bool unfreeze) > { > ... > ... > if(unfreeze) > blk_mq_unfreeze_queue(q); > mutex_unlock(&q->limits_lock); > } > > sd_revalidate_disk() > { > ... > ... > queue_limits_start_update(q, false); > > ... > ... > > blk_mq_freeze_queue(q); > queue_limits_commit_update(q, lim, true); This is overly complicated ... As I suggested, I think that a simpler approach is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside queue_limits_commit_update(). Doing so, no driver should need to directly call freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances that have the update/freeze order wrong. As mentioned, the pattern simply needs to be: queue_limits_start_update(); ... blk_mq_freeze_queue(); queue_limits_commit_update(); blk_mq_unfreeze_queue(); I fixed blk-zoned code last week like this. But I had not noticed that other places had a similar issue as well. -- Damien Le Moal Western Digital Research