On 7/2/2015 11:11 PM, Mike Christie wrote:
On 6/30/15, 9:55 AM, Sagi Grimberg wrote:
From: Ariel Nahum <arieln@xxxxxxxxxxxx>
Connection last_ping is not being updated when iscsi_send_nopout fails.
Not updating the last_ping will cause firing a timer to a past time
(last_ping + ping_tmo < current_time) which triggers an infinite loop of
iscsi_check_transport_timeouts() and hogs the cpu.
Fix this issue by checking the return value of iscsi_send_nopout.
If it fails set the next_timeout to one second later.
Signed-off-by: Ariel Nahum <arieln@xxxxxxxxxxxx>
Signed-off-by: Sagi Grimberg <sagig@xxxxxxxxxxxx>
---
drivers/scsi/libiscsi.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 8053f24..1ea4213 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -979,13 +979,13 @@ static void iscsi_tmf_rsp(struct iscsi_conn
*conn, struct iscsi_hdr *hdr)
wake_up(&conn->ehwait);
}
-static void iscsi_send_nopout(struct iscsi_conn *conn, struct
iscsi_nopin *rhdr)
+static int iscsi_send_nopout(struct iscsi_conn *conn, struct
iscsi_nopin *rhdr)
{
struct iscsi_nopout hdr;
struct iscsi_task *task;
if (!rhdr && conn->ping_task)
- return;
+ return -EINVAL;
memset(&hdr, 0, sizeof(struct iscsi_nopout));
hdr.opcode = ISCSI_OP_NOOP_OUT | ISCSI_OP_IMMEDIATE;
@@ -999,13 +999,16 @@ static void iscsi_send_nopout(struct iscsi_conn
*conn, struct iscsi_nopin *rhdr)
hdr.ttt = RESERVED_ITT;
task = __iscsi_conn_send_pdu(conn, (struct iscsi_hdr *)&hdr,
NULL, 0);
- if (!task)
+ if (!task) {
Are you hitting the failure case in the first chunk of the patch or the
failure right above? If the latter, why is it failing?
__iscsi_conn_send_pdu() might fail in various places but Iin our case
session->tt->xmit_task() might fail in a couple of scenarios:
1. Hot device removal: In this case the iser transport receives an
async event of a device removal and alerts the iscsi layer of a
connection failure, but it may still race with an async timer driven
nopout sends.
2. Device catastrophic error recovery: In this scenario, the RDMA device
enters a catastrophic error condition and initiates a recovery sequence.
At this period, the HW interfaces (packet send) are blocked
and immediately failed. This condition propagates to iscsi layer which
should take into account that xmit_task() might fail.
iscsi_conn_printk(KERN_ERR, conn, "Could not send nopout\n");
+ return -EIO;
+ }
else if (!rhdr) {
I think the coding style is wrong. It should be:
if () {
} else if {
}
Yep, we'll fix that in v1.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html