On Fri, Apr 24, 2020 at 03:55:35PM +0200, Hannes Reinecke wrote: > On 4/24/20 12:23 PM, Ming Lei wrote: > > When all CPUs in one hctx are offline and this hctx becomes inactive, we > > shouldn't run this hw queue for completing request any more. > > > > So allocate request from one live hctx, and clone & resubmit the request, > > either it is from sw queue or scheduler queue. > > > > Cc: John Garry <john.garry@xxxxxxxxxx> > > Cc: Bart Van Assche <bvanassche@xxxxxxx> > > Cc: Hannes Reinecke <hare@xxxxxxxx> > > Cc: Christoph Hellwig <hch@xxxxxx> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > --- > > block/blk-mq.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 98 insertions(+), 4 deletions(-) > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 0759e0d606b3..a4a26bb23533 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -2370,6 +2370,98 @@ static int blk_mq_hctx_notify_online(unsigned int cpu, struct hlist_node *node) > > return 0; > > } > > +static void blk_mq_resubmit_end_rq(struct request *rq, blk_status_t error) > > +{ > > + struct request *orig_rq = rq->end_io_data; > > + > > + blk_mq_cleanup_rq(orig_rq); > > + blk_mq_end_request(orig_rq, error); > > + > > + blk_put_request(rq); > > +} > > + > > +static void blk_mq_resubmit_rq(struct request *rq) > > +{ > > + struct request *nrq; > > + unsigned int flags = 0; > > + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > > + struct blk_mq_tags *tags = rq->q->elevator ? hctx->sched_tags : > > + hctx->tags; > > + bool reserved = blk_mq_tag_is_reserved(tags, rq->internal_tag); > > + > > + if (rq->rq_flags & RQF_PREEMPT) > > + flags |= BLK_MQ_REQ_PREEMPT; > > + if (reserved) > > + flags |= BLK_MQ_REQ_RESERVED; > > + > > + /* avoid allocation failure by clearing NOWAIT */ > > + nrq = blk_get_request(rq->q, rq->cmd_flags & ~REQ_NOWAIT, flags); > > + if (!nrq) > > + return; > > + > > Ah-ha. So what happens if we don't get a request here? So far it isn't possible if NOWAIT is cleared because the two requests belong to different hctx. > > > + blk_rq_copy_request(nrq, rq); > > + > > + nrq->timeout = rq->timeout; > > + nrq->rq_disk = rq->rq_disk; > > + nrq->part = rq->part; > > + > > + memcpy(blk_mq_rq_to_pdu(nrq), blk_mq_rq_to_pdu(rq), > > + rq->q->tag_set->cmd_size); > > + > > + nrq->end_io = blk_mq_resubmit_end_rq; > > + nrq->end_io_data = rq; > > + nrq->bio = rq->bio; > > + nrq->biotail = rq->biotail; > > + > > + if (blk_insert_cloned_request(nrq->q, nrq) != BLK_STS_OK) > > + blk_mq_request_bypass_insert(nrq, false, true); > > +} > > + > > Not sure if that is a good idea. > With the above code we would having to allocate an additional > tag per request; if we're running full throttle with all tags active where > should they be coming from? The two requests are from different hctx, and we don't have per-request-queue throttle in blk-mq, and scsi does have, however no requests from this inactive hctx exists in LLD. So no the throttle you worry about. > > And all the while we _have_ perfectly valid tags; the tag of the original > request _is_ perfectly valid, and we have made sure that it's not inflight No, we are talking request in sw queue and scheduler queue, which aren't assigned tag yet. > (because if it were we would have to wait for to be completed by the > hardware anyway). > > So why can't we re-use the existing tag here? No, the tag doesn't exist. Thanks, Ming