On Mon, Oct 21, 2019 at 03:02:56PM +0100, John Garry wrote: > On 21/10/2019 13:53, Ming Lei wrote: > > On Mon, Oct 21, 2019 at 12:49:53PM +0100, John Garry wrote: > > > > > > > > > > > > > > > > Yes, we share tags among all queues, but we generate the tag - known as IPTT > > > > > - in the LLDD now, as we can no longer use the request tag (as it is not > > > > > unique per all queues): > > > > > > > > > > https://github.com/hisilicon/kernel-dev/commit/087b95af374be6965583c1673032fb33bc8127e8#diff-f5d8fff19bc539a7387af5230d4e5771R188 > > > > > > > > > > As I said, the branch is messy and I did have to fix 087b95af374. > > > > > > > > Firstly this way may waste lots of memory, especially the queue depth is > > > > big, such as, hisilicon V3's queue depth is 4096. > > > > > > > > Secondly, you have to deal with queue busy efficiently and correctly, > > > > for example, your real hw tags(IPTT) can be used up easily, and how > > > > will you handle these dispatched request? > > > > > > I have not seen scenario of exhausted IPTT. And IPTT count is same as SCSI > > > host.can_queue, so SCSI midlayer should ensure that this does not occur. > > > > Hi Ming, > > > That check isn't correct, and each hw queue should have allowed > > .can_queue in-flight requests. > > There always seems to be some confusion or disagreement on this topic. > > I work according to the comment in scsi_host.h: > > "Note: it is assumed that each hardware queue has a queue depth of > can_queue. In other words, the total queue depth per host > is nr_hw_queues * can_queue." > > So I set Scsi_host.can_queue = HISI_SAS_MAX_COMMANDS (=4096) I believe all current drivers set .can_queue as single hw queue's depth. If you set .can_queue as HISI_SAS_MAX_COMMANDS which is HBA's queue depth, the hisilicon sas driver will HISI_SAS_MAX_COMMANDS * nr_hw_queues in-flight requests. > > > > > > > > > > > > > > Finally, you have to evaluate the performance effect, this is highly > > > > related with how to deal with out-of-IPTT. > > > > > > Some figures from our previous testing: > > > > > > Managed interrupt without exposing multiple queues: 3M IOPs > > > Managed interrupt with exposing multiple queues: 2.6M IOPs > > > > Then you see the performance regression. > > Let's discuss this when I send the patches, so we don't get sidetracked on > this blk-mq improvement topic. OK, what I meant is to use correct driver to test the patches, otherwise it might be hard to investigate. Thanks, Ming