Hi Maged thank you for your reply. To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I cleared the current lio configuration, redeployed two CENTOS7 (not in any ceph role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ; And do the following test 1. centos7 client mounts iscsi lun 2, write data to iscsi lun through dd 3. Close the target node that is active. (forced power off) [18:33:48 ] active target node power off [18:33:57] centos7 client found iscsi target interrupted [18:34:23] centos7 client converts to another target node The whole process lasted for 35 seconds, and ceph was always healthy during the test. This conversion time is too long to reach the production level. Do I still have a place to optimize? Below is the centos7 client log [messages] ============================================================ Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 4409496160 Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error (1022) Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 4:0 error (1022 - Invalid or unknown error code) state (3) Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed out after 25 secs Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 00 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358528 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device Mar 21 18:34:22 CEPH-client01test kernel: device-mapper: multipath: Failing path 8:0. Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 00 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358784 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 80 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358912 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f3 00 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355968 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 80 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2357120 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f2 80 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355840 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 80 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358656 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f5 00 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356480 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 00 00 00 80 00 Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356992 Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 24 03 00 00 00 80 00 Mar 21 18:34:22 CEPH-client01test multipathd: sda: mark as failed Mar 21 18:34:22 CEPH-client01test multipathd: mpathb: remaining active paths: 1 Mar 21 18:34:22 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state S non-preferred supports ToluSNA Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: Asymmetric access state changed Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA Mar 21 18:34:27 CEPH-client01test multipathd: mpathb: sdb - tur checker reports path is up Mar 21 18:34:27 CEPH-client01test multipathd: 8:16: reinstated Mar 21 18:34:33 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host) Mar 21 18:34:41 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host) -----邮件原件----- 发件人: Maged Mokhtar <mmokhtar@xxxxxxxxxxx> 发送时间: 2019年3月20日 15:36 收件人: li jerry <div8cn@xxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx 主题: Re: CEPH ISCSI LIO multipath change delay On 20/03/2019 07:43, li jerry wrote: > Hi,ALL > > I’ve deployed mimic(13.2.5) cluster on 3 CentOS 7.6 servers, then > configured iscsi-target and created a LUN, referring to > http://docs.ceph.com/docs/mimic/rbd/iscsi-target-cli/. > > I have another server which is CentOS 7.4, configured and mounted the > LUN I’ve just created, referring to > http://docs.ceph.com/docs/mimic/rbd/iscsi-initiator-linux/. > > I’m trying to do a HA testing: > > 1. Perform a WRITE test with DD command > > 2. Stop one ‘Activate’ iscsi-target node(ini 0), DD IO hangs over 25 > seconds until iscsi-target switch to another node > > 3. DD IO goes back normal > > My question is, why it takes so long for the iscsi-target switching? Is > there any settings I’ve misconfigured? > > Usually it only take a few seconds to switch on the enterprise storage > products. > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > If you mean you shutdown the entire host, if so from your description this is also running osds, so you also took out some osds serving io. if a primary osd is not responding, clients io (in this case your iscsi target) will block until ceph marks the osd down and issue a new epoch map mapping the pg to another osd. This process is controlled by osd_heartbeat_interval(5) and osd_heartbeat_grace(20) total 25 sec which is what you observe. I do not recommend you lower them, else your cluster will be over sensitive and osds could flap under load. Maged _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com