On Fri, Sep 30, 2016 at 11:55 PM, Bart Van Assche <bart.vanassche@xxxxxxxxxxx> wrote: > On 09/29/16 14:51, Ming Lei wrote: >> >> On Thu, Sep 29, 2016 at 7:59 AM, Bart Van Assche >> <bart.vanassche@xxxxxxxxxxx> wrote: >>> >>> blk_quiesce_queue() prevents that new queue_rq() invocations >> >> >> blk_mq_quiesce_queue() > > > Thanks, I will update the patch title and patch description. > >>> +void blk_mq_quiesce_queue(struct request_queue *q) >>> +{ >>> + struct blk_mq_hw_ctx *hctx; >>> + unsigned int i; >>> + bool res, rcu = false; >>> + >>> + spin_lock_irq(q->queue_lock); >>> + WARN_ON_ONCE(blk_queue_quiescing(q)); >>> + queue_flag_set(QUEUE_FLAG_QUIESCING, q); >>> + spin_unlock_irq(q->queue_lock); >>> + >>> + res = __blk_mq_freeze_queue_start(q, false); >> >> >> Looks the implementation is a bit tricky and complicated, if the percpu >> usage counter isn't killed, it isn't necessary to touch .mq_freeze_depth >> since you use QUEUE_FLAG_QUIESCING to set/get this status of >> the queue. > >> >> Then using synchronize_rcu() and rcu_read_lock()/rcu_read_unlock() >> with the flag of QUIESCING may be enough to wait for completing >> of ongoing invocations of .queue_rq() and avoid to run new .queue_rq, >> right? > > That's an interesting thought. Can you have a look at the attached patch in > which blk_mq_quiesce_queue() no longer manipulates the mq_freeze_depth > counter? About this part of manipulating 'mq_freeze_depth', your new patch looks better and cleaner. > >>> + WARN_ON_ONCE(!res); >>> + queue_for_each_hw_ctx(q, hctx, i) { >>> + if (hctx->flags & BLK_MQ_F_BLOCKING) { >>> + mutex_lock(&hctx->queue_rq_mutex); >>> + mutex_unlock(&hctx->queue_rq_mutex); >> >> >> Could you explain a bit why all BLOCKING is treated so special? And >> that flag just means the hw queue always need to schedule asynchronously, >> and of course even though the flag isn't set, it still may be run >> asynchronously too. So looks it should be enough to just use RCU. > > > The mutex manipulations introduce atomic stores in the hot path. Hence the > use of synchronize_rcu() and rcu_read_lock()/rcu_read_unlock() when > possible. Since BLK_MQ_F_BLOCKING indicates that .queue_rq() may sleep and > since sleeping is not allowed while holding the RCU read lock, > queue_rq_mutex has been introduced for the BLK_MQ_F_BLOCKING case. OK, got it. But this part looks still not good enough: > +static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) > +{ > + if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state))) > + return; > + > + if (hctx->flags & BLK_MQ_F_BLOCKING) { > + mutex_lock(&hctx->queue_rq_mutex); > + blk_mq_process_rq_list(hctx); > + mutex_unlock(&hctx->queue_rq_mutex); With the new mutex, .queue_rq can't be run concurrently any more, even though the hw queue can be mapped to more than one CPUs. So maybe srcu for BLK_MQ_F_BLOCKING? > + } else { > + rcu_read_lock(); > + blk_mq_process_rq_list(hctx); > + rcu_read_unlock(); > } ... > - if (!blk_mq_direct_issue_request(old_rq, &cookie)) > - goto done; > - blk_mq_insert_request(old_rq, false, true, true); > + > + if (data.hctx->flags & BLK_MQ_F_BLOCKING) { > + mutex_lock(&data.hctx->queue_rq_mutex); > + if (blk_queue_quiescing(q) || > + blk_mq_direct_issue_request(old_rq, &cookie) != 0) > + blk_mq_insert_request(old_rq, false, true, > + true); > + mutex_unlock(&data.hctx->queue_rq_mutex); > + } else { > + rcu_read_lock(); > + if (blk_queue_quiescing(q) || > + blk_mq_direct_issue_request(old_rq, &cookie) != 0) > + blk_mq_insert_request(old_rq, false, true, > + true); > + rcu_read_unlock(); > + } > + If we just call the rcu/srcu read lock(or the mutex) around .queue_rq(), the above code needn't to be duplicated any more. Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html