Re: Blk-mq/scsi-mq Tuning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Fri, 30 Oct 2015, Hannes Reinecke wrote:

On 10/28/2015 09:11 PM, Chad Dupuis wrote:
Hi Folks,

We¹ve begun to explore blk-mq and scsi-mq and wanted to know if there were
any best practices in terms of block layer settings.  We¹re looking
specifically at the FCoE and iSCSI protocols.

A little background on the queues in our hardware first: we have a per
connection transmit queue and multiple, global receive queues.  The
transmit queues are not pegged to a particular CPU.  The receive queues
are pegged to the first N CPUs where N is the number of receive queues.
We set the nr_hw_queues in the scsi_host_template to N as well.

Weelll ... I think you'll run into issues here.
The whole point of the multiqueue implementation is that you can tag the
submission _and_ completion queue to a single CPU, thereby eliminating
locking.
If you only peg the completion queue to a CPU you'll still have
contention on the submission queue, needing to take locks etc.

Plus you will _inevitably_ incur cache misses, as the completion will
basically never occur on the same CPU which did the submissoin.
Hence the context needs to be bounced to the CPU holding the completion
queue, or you'll need to do a IPI to inform the submitting CPU.
But if you do that you're essentially doing single-queue submission,
so I doubt we're seeing that great improvements.

This was why I was asking if there was a blk-mq API to be able to set CPU affinity for the hardware context queues so I could steer the submissions to the CPUs that my receive queues are on (even if they are allowed to float).

In our initial testing we¹re not seeing the performance scale as we would
expect so we wanted to see if there some Œknobs¹ if you will that we could
try tuning to try to increase the performance.  Also, one question we did
have is there an official API to be able to set the CPU affinity of the
hw_ctx_queues?

As above, given the underlying design I'm not surprised.

But above you mentioned 'per-connection submission queues'; from which
one could infer that there are several _hardware_ submission queues?
If so, _maybe_ we should look into doing MC/S (in the iSCSI case),
which would allow us to keep the 1:1 submission/completion ratio
preferred by blk-mq and still use several queues ... Hmm?

Yes, each connection has a transmit queue.


Cheers,

Hannes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux