faster switch to another mds

Fyodor Ustinov <ufm@xxxxxx> · Sat, 9 Feb 2019 20:49:07 +0200 (EET)

Hi!

I have ceph cluster with 3 nodes with mon/mgr/mds servers.
I reboot one node and see this in client log:

Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 socket closed (con state OPEN)
Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 session lost, hunting for new mon
Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon0 10.5.105.34:6789 session established
Feb 09 20:29:22 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state OPEN)
Feb 09 20:29:23 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING)
Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING)
Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING)
Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect start
Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect success
Feb 09 20:30:05 ceph-nfs1 kernel: ceph: mds0 recovery completed

As I understand it, the following has happened:
1. Client detects - link with mon server broken and fast switches to another mon (less that 1 seconds).
2. Client detects - link with mds server broken, 3 times trying reconnect (unsuccessful), waiting and reconnects to the same mds after 30 seconds downtime.

I have 2 questions:
1. Why?
2. How to reduce switching time to another mds?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com