On 03/21/2019 11:27 AM, Maged Mokhtar wrote: > > Though i do not recommend changing it, if there is a need to lower > fast_io_fail_tmo, then osd_heartbeat_interval + osd_heartbeat_grace sum > need to be lowered as well, their default sum is 25 sec, which i would > assume why fast_io_fail_tmo is set to this. you would want to have your > higher layer timeouts equal or larger than the layers below. Yeah, fast_io_fail_tmo is set to 25 to make sure the target has detected the initiator has marked the path as down and has done its cleanup. If you set that multipath timer lower you have to set the target side nops lower. When I can get those userspace flush patches merged upstream, then we do not have to rely on the kernel based nops to flush things and we can set fast_io_fail as low as we want (ignoring the ceph side timeouts I mean). > > /Maged > > > On 21/03/2019 17:07, Jason Dillaman wrote: >> It's just the design of the iSCSI protocol. Sure, you can lower the >> timeouts (see "fast_io_fail_tmo" [1]) but you will just end up w/ more >> false-positive failovers. >> >> [1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-linux/ >> >> On Thu, Mar 21, 2019 at 10:46 AM li jerry <div8cn@xxxxxxxxxxx> wrote: >>> Hi Maged >>> >>> thank you for your reply. >>> >>> To exclude the osd_heartbeat_interval and osd_heartbeat_grace >>> factors, I cleared the current lio configuration, redeployed two >>> CENTOS7 (not in any ceph role), and deployed rbd-target-api, >>> rbd-target-gw, trum-runner on it. ; >>> >>> And do the following test >>> 1. centos7 client mounts iscsi lun >>> 2, write data to iscsi lun through dd >>> 3. Close the target node that is active. (forced power off) >>> >>> [18:33:48 ] active target node power off >>> [18:33:57] centos7 client found iscsi target interrupted >>> [18:34:23] centos7 client converts to another target node >>> >>> >>> The whole process lasted for 35 seconds, and ceph was always healthy >>> during the test. >>> >>> This conversion time is too long to reach the production level. Do I >>> still have a place to optimize? >>> >>> >>> Below is the centos7 client log [messages] >>> ============================================================ >>> >>> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout >>> of 5 secs expired, recv timeout 5, last rx 4409486146, last ping >>> 4409491148, now 4409496160 >>> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected >>> conn error (1022) >>> Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI >>> connection 4:0 error (1022 - Invalid or unknown error code) state (3) >>> Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery >>> timed out after 25 secs >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 fd 00 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2358528 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing >>> request >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O >>> to offline device >>> Mar 21 18:34:22 CEPH-client01test kernel: device-mapper: multipath: >>> Failing path 8:0. >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 fe 00 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2358784 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 fe 80 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2358912 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 f3 00 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2355968 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 f7 80 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2357120 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 f2 80 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2355840 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 fd 80 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2358656 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 f5 00 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2356480 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 23 f7 00 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O >>> error, dev sda, sector 2356992 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED >>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: >>> Write(10) 2a 00 00 24 03 00 00 00 80 00 >>> Mar 21 18:34:22 CEPH-client01test multipathd: sda: mark as failed >>> Mar 21 18:34:22 CEPH-client01test multipathd: mpathb: remaining >>> active paths: 1 >>> Mar 21 18:34:22 CEPH-client01test kernel: sd 6:0:0:0: alua: port >>> group 02 state S non-preferred supports ToluSNA >>> Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: Asymmetric >>> access state changed >>> Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port >>> group 02 state A non-preferred supports ToluSNA >>> Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port >>> group 02 state A non-preferred supports ToluSNA >>> Mar 21 18:34:27 CEPH-client01test multipathd: mpathb: sdb - tur >>> checker reports path is up >>> Mar 21 18:34:27 CEPH-client01test multipathd: 8:16: reinstated >>> Mar 21 18:34:33 CEPH-client01test iscsid: connect to 172.17.1.23:3260 >>> failed (No route to host) >>> Mar 21 18:34:41 CEPH-client01test iscsid: connect to 172.17.1.23:3260 >>> failed (No route to host) >>> >>> -----邮件原件----- >>> 发件人: Maged Mokhtar <mmokhtar@xxxxxxxxxxx> >>> 发送时间: 2019年3月20日 15:36 >>> 收件人: li jerry <div8cn@xxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx >>> 主题: Re: CEPH ISCSI LIO multipath change delay >>> >>> >>> >>> On 20/03/2019 07:43, li jerry wrote: >>>> Hi,ALL >>>> >>>> I’ve deployed mimic(13.2.5) cluster on 3 CentOS 7.6 servers, then >>>> configured iscsi-target and created a LUN, referring to >>>> http://docs.ceph.com/docs/mimic/rbd/iscsi-target-cli/. >>>> >>>> I have another server which is CentOS 7.4, configured and mounted the >>>> LUN I’ve just created, referring to >>>> http://docs.ceph.com/docs/mimic/rbd/iscsi-initiator-linux/. >>>> >>>> I’m trying to do a HA testing: >>>> >>>> 1. Perform a WRITE test with DD command >>>> >>>> 2. Stop one ‘Activate’ iscsi-target node(ini 0), DD IO hangs over 25 >>>> seconds until iscsi-target switch to another node >>>> >>>> 3. DD IO goes back normal >>>> >>>> My question is, why it takes so long for the iscsi-target switching? Is >>>> there any settings I’ve misconfigured? >>>> >>>> Usually it only take a few seconds to switch on the enterprise storage >>>> products. >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> If you mean you shutdown the entire host, if so from your description >>> this is also running osds, so you also took out some osds serving io. >>> >>> if a primary osd is not responding, clients io (in this case your iscsi >>> target) will block until ceph marks the osd down and issue a new epoch >>> map mapping the pg to another osd. This process is controlled by >>> osd_heartbeat_interval(5) and osd_heartbeat_grace(20) total 25 sec which >>> is what you observe. I do not recommend you lower them, else your >>> cluster will be over sensitive and osds could flap under load. >>> >>> Maged >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com