Re: 答复: CEPH ISCSI LIO multipath change delay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Though i do not recommend changing it, if there is a need to lower fast_io_fail_tmo, then osd_heartbeat_interval + osd_heartbeat_grace sum need to be lowered as well, their default sum is 25 sec, which i would assume why fast_io_fail_tmo is set to this.  you would want to have your higher layer timeouts equal or larger than the layers below.

/Maged


On 21/03/2019 17:07, Jason Dillaman wrote:
It's just the design of the iSCSI protocol. Sure, you can lower the
timeouts (see "fast_io_fail_tmo" [1]) but you will just end up w/ more
false-positive failovers.

[1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-linux/

On Thu, Mar 21, 2019 at 10:46 AM li jerry <div8cn@xxxxxxxxxxx> wrote:
Hi Maged

thank you for your reply.

To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I cleared the current lio configuration, redeployed two CENTOS7 (not in any ceph role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ;

And do the following test
1. centos7 client mounts iscsi lun
2, write data to iscsi lun through dd
3. Close the target node that is active. (forced power off)

[18:33:48 ] active target node power off
[18:33:57] centos7 client found iscsi target interrupted
[18:34:23] centos7 client converts to another target node


The whole process lasted for 35 seconds, and ceph was always healthy during the test.

This conversion time is too long to reach the production level. Do I still have a place to optimize?


Below is the centos7 client log [messages]
============================================================

Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 4409496160
Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error (1022)
Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 4:0 error (1022 - Invalid or unknown error code) state (3)
Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed out after 25 secs
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358528
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: device-mapper: multipath: Failing path 8:0.
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358784
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358912
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f3 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355968
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2357120
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f2 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355840
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358656
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f5 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356480
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356992
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 24 03 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test multipathd: sda: mark as failed
Mar 21 18:34:22 CEPH-client01test multipathd: mpathb: remaining active paths: 1
Mar 21 18:34:22 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state S non-preferred supports ToluSNA
Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: Asymmetric access state changed
Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA
Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA
Mar 21 18:34:27 CEPH-client01test multipathd: mpathb: sdb - tur checker reports path is up
Mar 21 18:34:27 CEPH-client01test multipathd: 8:16: reinstated
Mar 21 18:34:33 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host)
Mar 21 18:34:41 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host)

-----邮件原件-----
发件人: Maged Mokhtar <mmokhtar@xxxxxxxxxxx>
发送时间: 2019年3月20日 15:36
收件人: li jerry <div8cn@xxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
主题: Re:  CEPH ISCSI LIO multipath change delay



On 20/03/2019 07:43, li jerry wrote:
Hi,ALL

I’ve deployed mimic(13.2.5) cluster on 3 CentOS 7.6 servers, then
configured iscsi-target and created a LUN, referring to
http://docs.ceph.com/docs/mimic/rbd/iscsi-target-cli/.

I have another server which is CentOS 7.4, configured and mounted the
LUN I’ve just created, referring to
http://docs.ceph.com/docs/mimic/rbd/iscsi-initiator-linux/.

I’m trying to do a HA testing:

1. Perform a WRITE test with DD command

2. Stop one ‘Activate’ iscsi-target node(ini 0), DD IO hangs over 25
seconds until iscsi-target switch to another node

3. DD IO goes back normal

My question is, why it takes so long for the iscsi-target switching? Is
there any settings I’ve misconfigured?

Usually it only take a few seconds to switch on the enterprise storage
products.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

If you mean you shutdown the entire host, if so from your description
this is also running osds, so you also took out some osds serving io.

if a primary osd is not responding, clients io (in this case your iscsi
target) will block until ceph marks the osd down and issue a new epoch
map mapping the pg to another osd. This process is controlled by
osd_heartbeat_interval(5) and osd_heartbeat_grace(20) total 25 sec which
is what you observe. I do not recommend you lower them, else your
cluster will be over sensitive and osds could flap under load.

Maged

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Maged Mokhtar
CEO PetaSAN
4 Emad El Deen Kamel
Cairo 11371, Egypt
www.petasan.org
+201006979931
skype: maged.mokhtar

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux