On Sat, Dec 21, 2024 at 06:33:13PM +0530, Nilay Shroff wrote: > > > On 12/19/24 11:50, Christoph Hellwig wrote: > > On Wed, Dec 18, 2024 at 06:57:45AM -0800, Damien Le Moal wrote: > >>> Yeah agreed but I see sd_revalidate_disk() is probably the only exception > >>> which allocates the blk-mq request. Can't we fix it? > >> > >> If we change where limits_lock is taken now, we will again introduce races > >> between user config and discovery/revalidation, which is what > >> queue_limits_start_update() and queue_limits_commit_update() intended to fix in > >> the first place. > >> > >> So changing sd_revalidate_disk() is not the right approach. > > > > Well, sd_revalidate_disk is a bit special in that it needs a command > > on the same queue to query the information. So it needs to be able > > to issue commands without the queue frozen. Freezing the queue inside > > the limits lock support that, sd just can't use the convenience helpers > > that lock and freeze. > > > >> This is overly complicated ... As I suggested, I think that a simpler approach > >> is to call blk_mq_freeze_queue() and blk_mq_unfreeze_queue() inside > >> queue_limits_commit_update(). Doing so, no driver should need to directly call > >> freeze/unfreeze. But that would be a cleanup. Let's first fix the few instances > >> that have the update/freeze order wrong. As mentioned, the pattern simply needs > > > > Yes, the queue only needs to be frozen for the actual update, > > which would remove the need for the locking. The big question for both > > variants is if we can get rid of all the callers that have the queue > > already frozen and then start an update. > > > After thinking for a while I found that broadly we've four categories of users > which need this pattern of limits-lock and/or queue-freeze: > > 1. Callers which need acquiring limits-lock while starting the update; and freezing > queue only when committing the update: > - sd_revalidate_disk sd_revalidate_disk() should be the most strange one, in which passthrough io command is required, so dependency on queue freeze lock can't be added, such as, q->limits_lock Actually the current queue limits structure aren't well-organized, otherwise limit lock isn't needed for reading queue limits from hardware, since sd_revalidate_disk() just overwrites partial limits. Or it can be done by refactoring sd_revalidate_disk(). However, the change might be a little big, and I guess that is the reason why Damien don't like it. > - nvme_init_identify > - loop_clear_limits > - few more... > > 2. Callers which need both freezing the queue and acquiring limits-lock while starting > the update: > - nvme_update_ns_info_block > - nvme_update_ns_info_generic > - few more... > > 3. Callers which neither need acquiring limits-lock nor require freezing queue as for > these set of callers in the call stack limits-lock is already acquired and queue is > already frozen: > - __blk_mq_update_nr_hw_queues > - queue_xxx_store and helpers I think it isn't correct. The queue limits are applied on fast IO path, in theory anywhere updating q->limits need to drain IOs in submission path at least after gendisk is added. Also Christoph adds limits-lock for avoiding to lose other concurrent update, which makes the problem more hard to solve. Thanks, Ming