On 01/29/18 07:41, Elliott, Robert (Persistent Memory) wrote:
-----Original Message-----
From: Linux-nvme [mailto:linux-nvme-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf
Of Hannes Reinecke
Sent: Monday, January 29, 2018 3:09 AM
To: lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx
Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx; Kashyap
Desai <kashyap.desai@xxxxxxxxxxxx>
Subject: [LSF/MM TOPIC] irq affinity handling for high CPU count machines
Hi all,
here's a topic which came up on the SCSI ML (cf thread '[RFC 0/2]
mpt3sas/megaraid_sas: irq poll and load balancing of reply queue').
When doing I/O tests on a machine with more CPUs than MSIx vectors
provided by the HBA we can easily setup a scenario where one CPU is
submitting I/O and the other one is completing I/O. Which will result in
the latter CPU being stuck in the interrupt completion routine for
basically ever, resulting in the lockup detector kicking in.
How should these situations be handled?
Should it be made the responsibility of the drivers, ensuring that the
interrupt completion routine is terminated after a certain time?
Should it be made the responsibility of the upper layers?
Should it be the responsibility of the interrupt mapping code?
Can/should interrupt polling be used in these situations?
Back when we introduced scsi-mq with hpsa, the best approach was to
route interrupts and completion handling so each CPU core handles its
own submissions; this way, they are self-throttling.
That approach may work for the hpsa adapter but I'm not sure whether it
works for all adapter types. It has already been observed with the SRP
initiator driver running inside a VM that a single core spent all its
time processing IB interrupts.
Additionally, only initiator workloads are self-throttling. Target style
workloads are not self-throttling.
In other words, I think it's worth to discuss this topic further.
Bart.