Currently when a node close a connection, it will send a user initiated ABORT instead of gracefully shut down(ece35848c184). Sadly it also could close the listening connection, so this node will fail to rejoin the cluster. I setup two node of cluster to do this test. While the cluster works fine, the connection looks like this: clt-n2-sles12b7-2:~ # netstat -apn|grep sctp sctp 147.2.208.197:21064 LISTEN - sctp 0 4 0.0.82.72:62887 147.2.208.197:21064 ESTABLISHED - and if I reboot the other node or stop running dlm, and all the connections get lost: clt-n2-sles12b7-2:~ # netstat -apn | grep sctp clt-n2-sles12b7-2:~ # so if the other node tries to rejoin the cluster, the following message flushes because of no listening port now. dlm: Trying to connect to 192.168.3.4 dlm: Can't start SCTP association - retrying dlm: Retry sending 64 bytes to node id 318951621 dlm: Retrying SCTP association init for node 318951621 Signed-off-by: Lidong Zhong <lzhong@xxxxxxxx> --- fs/dlm/lowcomms.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 1e5b453..d08e079 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -617,6 +617,11 @@ static void retry_failed_sctp_send(struct connection *recv_con, int nodeid = sn_send_failed->ssf_info.sinfo_ppid; log_print("Retry sending %d bytes to node id %d", len, nodeid); + + if (!nodeid) { + log_print("Shouldn't resend data via listening connection."); + return; + } con = nodeid2con(nodeid, 0); if (!con) { -- 1.8.1.4 -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster