Re: [PATCH 3/3] qla2xxx: Fix NVME cmd and LS cmd timeout race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/14/19 3:10 PM, Himanshu Madhani wrote:
From: Quinn Tran <qutran@xxxxxxxxxxx>

This patch uses kref to protect access between fcp_abort
path and nvme command and LS command completion path.
Stack trace below shows the abort path is accessing stale
memory (nvme_private->sp).

When command kref reaches 0, nvme_private & srb resource will
be disconnected from each other.  Any subsequence nvme abort
request will not be able to reference the original srb.

[ 5631.003998] BUG: unable to handle kernel paging request at 00000010000005d8
[ 5631.004016] IP: [<ffffffffc087df92>] qla_nvme_abort_work+0x22/0x100 [qla2xxx]
[ 5631.004086] Workqueue: events qla_nvme_abort_work [qla2xxx]
[ 5631.004097] RIP: 0010:[<ffffffffc087df92>]  [<ffffffffc087df92>] qla_nvme_abort_work+0x22/0x100 [qla2xxx]
[ 5631.004109] Call Trace:
[ 5631.004115]  [<ffffffffaa4b8174>] ? pwq_dec_nr_in_flight+0x64/0xb0
[ 5631.004117]  [<ffffffffaa4b9d4f>] process_one_work+0x17f/0x440
[ 5631.004120]  [<ffffffffaa4bade6>] worker_thread+0x126/0x3c0

Signed-off-by: Quinn Tran <qutran@xxxxxxxxxxx>
Signed-off-by: Himanshu Madhani <hmadhani@xxxxxxxxxxx>
---
  drivers/scsi/qla2xxx/qla_def.h  |   2 +
  drivers/scsi/qla2xxx/qla_nvme.c | 164 ++++++++++++++++++++++++++++------------
  drivers/scsi/qla2xxx/qla_nvme.h |   1 +
  3 files changed, 117 insertions(+), 50 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 602ed24bb806..85a27ee5d647 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -532,6 +532,8 @@ typedef struct srb {
  	uint8_t cmd_type;
  	uint8_t pad[3];
  	atomic_t ref_count;
+	struct kref cmd_kref;	/* need to migrate ref_count over to this */
+	void *priv;
  	wait_queue_head_t nvme_ls_waitq;
  	struct fc_port *fcport;
  	struct scsi_qla_host *vha;

My feedback about this patch is as follows:
- I think that having two atomic counters in struct srb with a very
  similar purpose is overkill and also that a single atomic counter
  should be sufficient in this structure.
- The problem fixed by this patch not only can happen with NVMe-FC but
  also with FC. I would prefer to see all code paths fixed for which a
  race between completion and abort can occur.

Thanks,

Bart.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux