On Thu, Apr 06, 2017 at 12:31:18PM +0800, Ming Lei wrote: > On Wed, Apr 05, 2017 at 12:01:29PM -0700, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@xxxxxx> > > > > While dispatching requests, if we fail to get a driver tag, we mark the > > hardware queue as waiting for a tag and put the requests on a > > hctx->dispatch list to be run later when a driver tag is freed. However, > > blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware > > queues if using a single-queue scheduler with a multiqueue device. If > > It can't perform well by using a SQ scheduler on a MQ device, so just be > curious why someone wants to do that in this way,:-) I don't know why anyone would want to, but it has to work :) The only reason we noticed this is because when the NBD device is created, it only has a single queue, so we automatically assign mq-deadline to it. Later, we update the number of queues, but it's still using mq-deadline. > I guess you mean that ops.mq.dispatch_request() may dispatch requests > from other hardware queues in blk_mq_sched_dispatch_requests() instead > of current hctx. Yup, that's right. It's weird, and I talked to Jens about just forcing the MQ device into an SQ mode when using an SQ scheduler, but this way works fine more or less. > If that is true, it looks like a issue in usage of I/O scheduler, since > the mq-deadline scheduler just queues requests in one per-request_queue > linked list, for MQ device, the scheduler queue should have been per-hctx. That's an option, but that's a different scheduling policy. Again, I agree that it's strange, but it's reasonable behavior. > > blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we > > are processing. This means we end up using the hardware queue of the > > previous request, which may or may not be the same as that of the > > current request. If it isn't, the wrong hardware queue will end up > > waiting for a tag, and the requests will be on the wrong dispatch list, > > leading to a hang. > > > > The fix is twofold: > > > > 1. Make sure we save which hardware queue we were trying to get a > > request for in blk_mq_get_driver_tag() regardless of whether it > > succeeds or not. > > It shouldn't have be needed, once rq->mq_ctx is set, the hctx should been > fixed already, that means we can just pass the hctx to blk_mq_get_driver_tag(). We still need to do the blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) step to get the hctx. We could do that at every callsite of blk_mq_get_driver_tag(), but this way it's just factored into blk_mq_get_driver_tag(). Thanks! > > 2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a > > blk_mq_hw_queue to make it clear that it must handle multiple > > hardware queues, since I've already messed this up on a couple of > > occasions. > > > > This didn't appear in testing with nvme and mq-deadline because nvme has > > more driver tags than the default number of scheduler tags. However, > > with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd. > > > > Signed-off-by: Omar Sandoval <osandov@xxxxxx>