On 3/5/2014 2:06 AM, Nicholas A. Bellinger wrote:
On Tue, 2014-03-04 at 17:17 +0200, Sagi Grimberg wrote:
On 3/4/2014 2:00 AM, Nicholas A. Bellinger wrote:
From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
Hi Or & Sagi,
This series addresses a number of active I/O shutdown related issues
in iser-target code that have come up recently during stress testing.
Note there is still a seperate iser-target network portal shutdown
bug being tracked down, but this series addresses all existing issues
related to active I/O session shutdown.
The patch breakdown looks like:
Patch #1 fixes a long-standing bug where TPGs in shutdown incorrectly
could be referenced by new login attempts.
Patch #2 converts list_del -> list_del_init for iscsi_cmd->i_conn_node
so that list_empty works correctly.
Patch #3 addresses isert_conn->state related bugs resulting in hung
shutdown, and splits isert_free_conn() into seperate code that is
called earlier during shutdown to ensure that all outstanding I/O
has completed.
Patch #4 fixes incorrect accounting of ->post_send_buf_count during
active I/O shutdown with outstanding RDMA WRITE + RDMA READ work
requests.
Patch #5 addresses a bug related to active I/O shutdown with
outstanding FRMR work requests. Note this patch is specific to
v3.12+ code.
Patch #6 addresses bugs related to active I/O shutdown with
outstanding completion interrupt coalescing batches. Note this patch
is specific to v3.13+ code.
Please review.
Hey Nic,
So besides a minor comment, you have my Ack on this set.
Thanks!
More on cleanup flow. isert_cma_handler does not handle
RDMA_CM_EVENT_TIMEWAIT_EXIT.
To be more specific, according to IB spec, when initiating disconnect
(rdma_disconnect/ib_send_cm_dreq),
one should not destroy a used qp until getting TIMEWAIT_EXIT CM event.
We are working on this in iSER initiator.
It might lead to "stale connection" CM rejects on future connections
(SRP also does not do that).
<nod>, I noticed that as well during recent debugging.
However, AFAICT the RDMA_CM_EVENT_TIMEWAIT_EVENT doesn't (always) occur
on the target side after a RDMA_CM_EVENT_DISCONNECTED, and thus far I've
not been able to ascertain what's different about the shutdown sequence
that would make this happen, or not happen..
Any ideas..?
That's probably because the cm_id is destroyed before you get the event.
There is a specific
timout computation to get this event (see IB spec). If you will attempt
to disconnect while
the link is down (initiator won't receive it and send you disconnect
back), you should be able
to see this event. As I understand, in order to comply the spec, the QP
(and the cm_id afterwards)
should be destroyed only when getting this event and not before.
Sagi.
--nab
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html