On 11/7/19 9:55 AM, Himanshu Madhani wrote: > > >> On Nov 7, 2019, at 10:54 AM, Bart Van Assche <bvanassche@xxxxxxx >> <mailto:bvanassche@xxxxxxx>> wrote: >> >> On 11/5/19 7:06 AM, Himanshu Madhani wrote: >>> From: Quinn Tran <qutran@xxxxxxxxxxx <mailto:qutran@xxxxxxxxxxx>> >>> This patch fixes driver unload hang by removing msleep() >>> Fixes: d74595278f4ab ("scsi: qla2xxx: Add multiple queue pair >>> functionality.") >>> Cc: stable@xxxxxxxxxxxxxxx <mailto:stable@xxxxxxxxxxxxxxx> >>> Signed-off-by: Quinn Tran <qutran@xxxxxxxxxxx >>> <mailto:qutran@xxxxxxxxxxx>> >>> Signed-off-by: Himanshu Madhani <hmadhani@xxxxxxxxxxx >>> <mailto:hmadhani@xxxxxxxxxxx>> >>> --- >>> drivers/scsi/qla2xxx/qla_init.c | 2 -- >>> 1 file changed, 2 deletions(-) >>> diff --git a/drivers/scsi/qla2xxx/qla_init.c >>> b/drivers/scsi/qla2xxx/qla_init.c >>> index bddb26baedd2..ff4528702b4e 100644 >>> --- a/drivers/scsi/qla2xxx/qla_init.c >>> +++ b/drivers/scsi/qla2xxx/qla_init.c >>> @@ -9009,8 +9009,6 @@ int qla2xxx_delete_qpair(struct scsi_qla_host >>> *vha, struct qla_qpair *qpair) >>> struct qla_hw_data *ha = qpair->hw; >>> qpair->delete_in_progress = 1; >>> -while (atomic_read(&qpair->ref_count)) >>> -msleep(500); >>> ret = qla25xx_delete_req_que(vha, qpair->req); >>> if (ret != QLA_SUCCESS) >> >> I think that an explanation is needed why that loop had been >> introduced and also why it is safe not to wait until qpair->ref_count >> drops to zero in qla2xxx_delete_qpair(). >> > > commit d74595278f4ab had drawback in design for MQ implementation in > qla2xxx. Now that we have been making this more stable with MQ being > default on for 5x kernel. What we discovered that after heavy IO > workload in a cluster environment, driver unload encountered hang and > shows following stack trace > > # ps -fax | grep rmmod > 6029 pts/0 D+ 0:00 | \_ rmmod qla2xxx > > [<0>] msleep+0x29/0x30 [<0>] qla2xxx_delete_qpair+0x2c/0x160 [qla2xxx] > [<0>] qla25xx_delete_queues+0x14b/0x1d0 [qla2xxx] [<0>] > qla2x00_free_device+0x31/0xe0 [qla2xxx] [<0>] > qla2x00_remove_one+0x239/0x370 [qla2xxx] [<0>] > pci_device_remove+0x3b/0xc0 [<0>] > device_release_driver_internal+0x18c/0x250 [<0>] driver_detach+0x39/0x6d > [<0>] bus_remove_driver+0x77/0xc9 [<0>] pci_unregister_driver+0x2d/0xb0 > [<0>] qla2x00_module_exit+0x2d/0x90 [qla2xxx] [<0>] > __x64_sys_delete_module+0x139/0x270 [<0>] do_syscall_64+0x5b/0x1b0 [<0>] > entry_SYSCALL_64_after_hwframe+0x65/0xca [<0>] 0xffffffffffffffff > > Removing this msleep() help resolve this stack trace. Hi Himanshu, Does your answer mean that this hang has not yet been root-caused fully and hence that it is possible this patch is only a workaround but not a fix of the root cause? Thanks, Bart.