Hi Don, Thanks for your test! On Thu, Mar 01, 2018 at 04:18:17PM +0000, Don Brace wrote: > > -----Original Message----- > > From: Ming Lei [mailto:ming.lei@xxxxxxxxxx] > > Sent: Tuesday, February 27, 2018 4:08 AM > > To: Jens Axboe <axboe@xxxxxxxxx>; linux-block@xxxxxxxxxxxxxxx; Christoph > > Hellwig <hch@xxxxxxxxxxxxx>; Mike Snitzer <snitzer@xxxxxxxxxx> > > Cc: linux-scsi@xxxxxxxxxxxxxxx; Hannes Reinecke <hare@xxxxxxx>; Arun Easi > > <arun.easi@xxxxxxxxxx>; Omar Sandoval <osandov@xxxxxx>; Martin K . > > Petersen <martin.petersen@xxxxxxxxxx>; James Bottomley > > <james.bottomley@xxxxxxxxxxxxxxxxxxxxx>; Christoph Hellwig <hch@xxxxxx>; > > Don Brace <don.brace@xxxxxxxxxxxxx>; Kashyap Desai > > <kashyap.desai@xxxxxxxxxxxx>; Peter Rivera <peter.rivera@xxxxxxxxxxxx>; > > Laurence Oberman <loberman@xxxxxxxxxx>; Ming Lei > > <ming.lei@xxxxxxxxxx>; Meelis Roos <mroos@xxxxxxxx> > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue > > > > EXTERNAL EMAIL > > > > > > From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs), > > one msix vector can be created without any online CPU mapped, then one > > command's completion may not be notified. > > > > This patch setups mapping between cpu and reply queue according to irq > > affinity info retrived by pci_irq_get_affinity(), and uses this mapping > > table to choose reply queue for queuing one command. > > > > Then the chosen reply queue has to be active, and fixes IO hang caused > > by using inactive reply queue which doesn't have any online CPU mapped. > > > > Cc: Hannes Reinecke <hare@xxxxxxx> > > Cc: Arun Easi <arun.easi@xxxxxxxxxx> > > Cc: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx>, > > Cc: James Bottomley <james.bottomley@xxxxxxxxxxxxxxxxxxxxx>, > > Cc: Christoph Hellwig <hch@xxxxxx>, > > Cc: Don Brace <don.brace@xxxxxxxxxxxxx> > > Cc: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> > > Cc: Peter Rivera <peter.rivera@xxxxxxxxxxxx> > > Cc: Laurence Oberman <loberman@xxxxxxxxxx> > > Cc: Meelis Roos <mroos@xxxxxxxx> > > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs") > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > I am getting some issues that need to be tracked down: I check the patch one more time, not find odd thing, and the only one is that inside hpsa_do_reset(), wait_for_device_to_become_ready() is called to send 'test unit ready' always by the reply queue 0. Do you know if something bad may happen if other non-zero reply queue is used? Could you share us how you reproduce this issue? Looks you can boot successfully, so could you please provide the following output? 1) what is your server type? We may find one in our lab, so that I can try to reproduce it. 2) lscpu 3) irq affinity info, and you need to pass the 1st column of 'lspci' of your hpsa PCI device to this script: #!/bin/sh if [ $# -ge 1 ]; then PCID=$1 else PCID=`lspci | grep "Non-Volatile memory" | cut -c1-7` fi PCIP=`find /sys/devices -name *$PCID | grep pci` IRQS=`ls $PCIP/msi_irqs` echo "kernel version: " uname -a echo "PCI name is $PCID, dump its irq affinity:" for IRQ in $IRQS; do CPUS=`cat /proc/irq/$IRQ/smp_affinity_list` echo "\tirq $IRQ, cpu list $CPUS" done Thanks, Ming