On 6/3/19 10:16 AM, Ming Lei wrote: > On Mon, Jun 03, 2019 at 09:46:39AM +0200, Hannes Reinecke wrote: >> On 6/3/19 9:37 AM, Ming Lei wrote: >>> On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote: >>>> On 6/1/19 1:06 AM, Ming Lei wrote: >>>>> On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote: >>>>>> On 5/31/19 10:46 AM, Ming Lei wrote: >>>> [ .. ] >>>>>> First we check for the 'slot_index_alloc()' callback to handle weird v2 >>>>>> allocation rules, _then_ we look for a tag, and only if we do _not_ have >>>>>> a tag we're using the bitmap. >>>>> >>>>> OK, looks I miss the above change. >>>>> >>>>>> And the bitmap is already correctly sized, as otherwise we'd have a >>>>>> clash between internal and tagged I/O commands even now. >>>>> >>>>> But now the big problem is in the following two line code: >>>>> >>>>> + else if (blk_tag != (u32)-1) >>>>> + rc = blk_mq_unique_tag_to_tag(blk_tag); >>>>> >>>>> Request from different blk-mq hw queue has same tag returned from >>>>> blk_mq_unique_tag_to_tag(). >>>>> >>>> Yes, but the sbitmap allocator will ensure that each command will get a >>>> unique tag. >>> >>> Each hw queue has independent sbitmap allocator, so commands with same >>> tag can come from different hw queue. >>> >> It does not for SCSI. >> See below. >> >>> So you meant this RFC patch depends on the host-wide tags patchset I >>> posted? >>> >>>> >>>>> Now the biggest question is that if V3 hw supports per-queue tags, >>>>> If yes, it should be real MQ hardware, otherwise I guess commands with >>>>> same tag at the same time may not work for host-wide tags. >>>>> >>>> >>>> Of course you can't have different commands with the same tag. But the >>>> sbitmap allocator prevents this from happening, as for host-wide tags >>>> the tagset is _shared_ between all devices, so the sbitmap allocator >>>> will only ever run on _one_ tagset for all commands. >>> >>> But blk-mq doesn't support host-wide tags yet, so how can this single >>> patch work? >>> >> Wrong. It does: >> >> struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev) >> { >> sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set); >> if (IS_ERR(sdev->request_queue)) >> return NULL; >> >> sdev->request_queue->queuedata = sdev; >> __scsi_init_queue(sdev->host, sdev->request_queue); >> blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue); >> return sdev->request_queue; >> } >> >> >> IE every scsi device is using the tagset from the host. > > Looks we are not in the same page, and you misunderstood two concepts: > scsi's host-wide tagset, and the new host-tags of BLK_MQ_F_HOST_TAGS. > > I admit that the new flag of BLK_MQ_F_HOST_TAGS is misleading. > > Now let me clarify it a bit: > > 1) the current SCSI hostwide tags means all LUNs share the host tagset, > but the tagset may include multiple hw queues, and each hw queue still > has independent tags, that is why blk-mq provides blk_mq_unique_tag(). > In short, each LUN's hw queue has independent tags. > Which is where I fundamentally disagree. Each hw queue does _not_ have independent tags. Each hw queue will use tags from the same (host-wide) tagset; the tags themselves will be allocated for each queue on an ad-hoc base, ie there is no fixed mapping between tag values and hardware queues. Which is why there is blk_mq_unique_tag(); this precisely returns a tag which is unique across all devices from this host by shifting the queue number on top of the actual tag number: u32 blk_mq_unique_tag(struct request *rq) { return (rq->mq_hctx->queue_num << BLK_MQ_UNIQUE_TAG_BITS) | (rq->tag & BLK_MQ_UNIQUE_TAG_MASK); } So the tagset itself is not split across devices, and all devices can (potentially) use all tags from the tagset. > > 2) some drivers(hisi_v3 hw, hpsa, megraid_sas, mpt3sas) only support > single hw queue, but has multiple reply queue which can avoid CPU > lockup, so we are working towards converting the private reply queue > into blk-mq hw queue. That is what BLK_MQ_F_HOST_TAGS covers. > I am aware. > Now you think 1) is enough for converting the private reply queue into > blk-mq hw queue, that is definitely not correct since blk-mq MQ doesn't > support shared tags among hw queues. > Currently blk-mq assumes a symmetric submission/completion model, ie it is assumed that a multiqueue device will have the same number of submissions and completion queues. For the HBAs listed above the submission works by writing an address into a single dedicated PCI register, whereas there are completion queues per interrupt. So we really have a 1:n mapping between submission and completion queues. Which is what your patchset is trying to address, and I'm perfectly fine with this. All I'm arguing is that hisi_sas v3 really maps onto the original model, seeing that is has per-queue submission paths, and as such maps more closely on the original model for blk-mq. Hence I do think that hisi_sas v3 should benefit more from moving to that 'full' blk-mq model, and not the simplified 1:n model you are proposing. We still should be going ahead with your patchset for the actual RAID HBAs like megaraid_sas etc. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)