答复: CEPH ISCSI LIO multipath change delay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Maged

thank you for your reply.

To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I cleared the current lio configuration, redeployed two CENTOS7 (not in any ceph role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ;

And do the following test
1. centos7 client mounts iscsi lun
2, write data to iscsi lun through dd
3. Close the target node that is active. (forced power off)

[18:33:48 ] active target node power off
[18:33:57] centos7 client found iscsi target interrupted
[18:34:23] centos7 client converts to another target node


The whole process lasted for 35 seconds, and ceph was always healthy during the test.

This conversion time is too long to reach the production level. Do I still have a place to optimize?


Below is the centos7 client log [messages]
============================================================

Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 4409496160
Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error (1022)
Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 4:0 error (1022 - Invalid or unknown error code) state (3)
Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed out after 25 secs
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358528
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
Mar 21 18:34:22 CEPH-client01test kernel: device-mapper: multipath: Failing path 8:0.
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358784
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358912
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f3 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355968
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2357120
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f2 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355840
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 80 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358656
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f5 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356480
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356992
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 24 03 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test multipathd: sda: mark as failed
Mar 21 18:34:22 CEPH-client01test multipathd: mpathb: remaining active paths: 1
Mar 21 18:34:22 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state S non-preferred supports ToluSNA
Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: Asymmetric access state changed
Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA
Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA
Mar 21 18:34:27 CEPH-client01test multipathd: mpathb: sdb - tur checker reports path is up
Mar 21 18:34:27 CEPH-client01test multipathd: 8:16: reinstated
Mar 21 18:34:33 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host)
Mar 21 18:34:41 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host)

-----邮件原件-----
发件人: Maged Mokhtar <mmokhtar@xxxxxxxxxxx> 
发送时间: 2019年3月20日 15:36
收件人: li jerry <div8cn@xxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
主题: Re:  CEPH ISCSI LIO multipath change delay



On 20/03/2019 07:43, li jerry wrote:
> Hi,ALL
> 
> I’ve deployed mimic(13.2.5) cluster on 3 CentOS 7.6 servers, then 
> configured iscsi-target and created a LUN, referring to 
> http://docs.ceph.com/docs/mimic/rbd/iscsi-target-cli/.
> 
> I have another server which is CentOS 7.4, configured and mounted the 
> LUN I’ve just created, referring to 
> http://docs.ceph.com/docs/mimic/rbd/iscsi-initiator-linux/.
> 
> I’m trying to do a HA testing:
> 
> 1. Perform a WRITE test with DD command
> 
> 2. Stop one ‘Activate’ iscsi-target node(ini 0), DD IO hangs over 25 
> seconds until iscsi-target switch to another node
> 
> 3. DD IO goes back normal
> 
> My question is, why it takes so long for the iscsi-target switching? Is 
> there any settings I’ve misconfigured?
> 
> Usually it only take a few seconds to switch on the enterprise storage 
> products.
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

If you mean you shutdown the entire host, if so from your description 
this is also running osds, so you also took out some osds serving io.

if a primary osd is not responding, clients io (in this case your iscsi 
target) will block until ceph marks the osd down and issue a new epoch 
map mapping the pg to another osd. This process is controlled by 
osd_heartbeat_interval(5) and osd_heartbeat_grace(20) total 25 sec which 
is what you observe. I do not recommend you lower them, else your 
cluster will be over sensitive and osds could flap under load.

Maged

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux