Re: [PATCH] blk-mq: Fix cpu indexing error in blk_mq_alloc_request_hctx()

Ming Lei <ming.lei@xxxxxxxxxx> · Sun, 27 Oct 2019 15:23:15 +0800

On Fri, Oct 25, 2019 at 03:33:15PM -0700, Sagi Grimberg wrote:
> 
> > > > > hctx is specified specifically, it is the 1st command on a new nvme
> > > > > controller queue. The command *must* be issued on the queue it is to
> > > > > initialize (this is different from pci nvme).  The hctx is specified so the
> > > > > correct nvme queue is selected when the command comes down the request path.
> > > > > Saying "don't do that" means one of the following: a) snooping every rq on
> > > > > the request path to spot initialization ios and move them to the right
> > > > > queue; or b) creating a duplicate non-blk-mq request path for this 1
> > > > > initialization io. Both of those are ugly.
> > > > 
> > > > In nvmf_connect_io_queue(), 'qid' has been encoded into instance of 'struct
> > > > nvme_command', that means the 'nvme controller' should know the
> > > > specified queue by parsing the command. So still not understand why you
> > > > have to submit the command via the specified queue.
> > > 
> > > The connect command must be send on the queue that it is connecting, the
> > > qid is telling the controller the id of the queue, but the controller
> > > still expects the connect to be issued on the queue that it is designed
> > > to connect (or rather initialize).
> > > 
> > > in queue_rq we take queue from hctx->driver_data and use it to issue
> > > the command. The connect is different that it is invoked on a context
> > > that is not necessarily running from a cpu that maps to this specific
> > > hctx. So in essence what is needed is a tag from the specific queue tags
> > > without running cpu consideration.
> > 
> > OK, got it.
> > 
> > If nvmf_connect_io_queue() is only run before setting up IO queues, the
> > shared tag problem could be solved easily, such as, use a standalone
> > tagset?
> 
> Not sure what you mean exactly...
> 
> Also, keep in mind that this sequence also goes into reconnects, where
> we already have our tagset allocated (with pending requests
> potentially).

I just found the connect command is always submitted via the unique reserved
tag 0, so it is nothing to do with IO requests any more.

You can use the reserved tag 0 for connect command as before just by not
sharing tagset between connect queue and IO queues.

Follows the detailed idea:

1) still reserve 1 tag for connect command in the IO tagset.

2) in each driver, create a conn_tagset for .connect_q only, and create
the .connect_q from this 'conn_tagset', and sets conn_tagset.nr_hw_queues
as 1, and sets conn_tagset.queue_depth as 'ctrl.queue_count - 1'

3) in .queue_rq of conn_tagset.ops:

- parse index of queue to be connected from nvme_command.conn.qid
- set the connect command's tag as 0
- then do every other thing just like before

Thanks,
Ming