Re: 答复: CEPH ISCSI LIO multipath change delay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's just the design of the iSCSI protocol. Sure, you can lower the
timeouts (see "fast_io_fail_tmo" [1]) but you will just end up w/ more
false-positive failovers.

[1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-linux/

On Thu, Mar 21, 2019 at 10:46 AM li jerry <div8cn@xxxxxxxxxxx> wrote:
>
> Hi Maged
>
> thank you for your reply.
>
> To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I cleared the current lio configuration, redeployed two CENTOS7 (not in any ceph role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ;
>
> And do the following test
> 1. centos7 client mounts iscsi lun
> 2, write data to iscsi lun through dd
> 3. Close the target node that is active. (forced power off)
>
> [18:33:48 ] active target node power off
> [18:33:57] centos7 client found iscsi target interrupted
> [18:34:23] centos7 client converts to another target node
>
>
> The whole process lasted for 35 seconds, and ceph was always healthy during the test.
>
> This conversion time is too long to reach the production level. Do I still have a place to optimize?
>
>
> Below is the centos7 client log [messages]
> ============================================================
>
> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 4409496160
> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error (1022)
> Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 4:0 error (1022 - Invalid or unknown error code) state (3)
> Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed out after 25 secs
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358528
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline device
> Mar 21 18:34:22 CEPH-client01test kernel: device-mapper: multipath: Failing path 8:0.
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358784
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fe 80 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358912
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f3 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355968
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 80 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2357120
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f2 80 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2355840
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 fd 80 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2358656
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f5 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356480
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 23 f7 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev sda, sector 2356992
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 00 00 24 03 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test multipathd: sda: mark as failed
> Mar 21 18:34:22 CEPH-client01test multipathd: mpathb: remaining active paths: 1
> Mar 21 18:34:22 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state S non-preferred supports ToluSNA
> Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: Asymmetric access state changed
> Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA
> Mar 21 18:34:23 CEPH-client01test kernel: sd 6:0:0:0: alua: port group 02 state A non-preferred supports ToluSNA
> Mar 21 18:34:27 CEPH-client01test multipathd: mpathb: sdb - tur checker reports path is up
> Mar 21 18:34:27 CEPH-client01test multipathd: 8:16: reinstated
> Mar 21 18:34:33 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host)
> Mar 21 18:34:41 CEPH-client01test iscsid: connect to 172.17.1.23:3260 failed (No route to host)
>
> -----邮件原件-----
> 发件人: Maged Mokhtar <mmokhtar@xxxxxxxxxxx>
> 发送时间: 2019年3月20日 15:36
> 收件人: li jerry <div8cn@xxxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
> 主题: Re:  CEPH ISCSI LIO multipath change delay
>
>
>
> On 20/03/2019 07:43, li jerry wrote:
> > Hi,ALL
> >
> > I’ve deployed mimic(13.2.5) cluster on 3 CentOS 7.6 servers, then
> > configured iscsi-target and created a LUN, referring to
> > http://docs.ceph.com/docs/mimic/rbd/iscsi-target-cli/.
> >
> > I have another server which is CentOS 7.4, configured and mounted the
> > LUN I’ve just created, referring to
> > http://docs.ceph.com/docs/mimic/rbd/iscsi-initiator-linux/.
> >
> > I’m trying to do a HA testing:
> >
> > 1. Perform a WRITE test with DD command
> >
> > 2. Stop one ‘Activate’ iscsi-target node(ini 0), DD IO hangs over 25
> > seconds until iscsi-target switch to another node
> >
> > 3. DD IO goes back normal
> >
> > My question is, why it takes so long for the iscsi-target switching? Is
> > there any settings I’ve misconfigured?
> >
> > Usually it only take a few seconds to switch on the enterprise storage
> > products.
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> If you mean you shutdown the entire host, if so from your description
> this is also running osds, so you also took out some osds serving io.
>
> if a primary osd is not responding, clients io (in this case your iscsi
> target) will block until ceph marks the osd down and issue a new epoch
> map mapping the pg to another osd. This process is controlled by
> osd_heartbeat_interval(5) and osd_heartbeat_grace(20) total 25 sec which
> is what you observe. I do not recommend you lower them, else your
> cluster will be over sensitive and osds could flap under load.
>
> Maged
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux