Hi! I have ceph cluster with 3 nodes with mon/mgr/mds servers. I reboot one node and see this in client log: Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 socket closed (con state OPEN) Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 session lost, hunting for new mon Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon0 10.5.105.34:6789 session established Feb 09 20:29:22 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state OPEN) Feb 09 20:29:23 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING) Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING) Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING) Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect start Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect success Feb 09 20:30:05 ceph-nfs1 kernel: ceph: mds0 recovery completed As I understand it, the following has happened: 1. Client detects - link with mon server broken and fast switches to another mon (less that 1 seconds). 2. Client detects - link with mds server broken, 3 times trying reconnect (unsuccessful), waiting and reconnects to the same mds after 30 seconds downtime. I have 2 questions: 1. Why? 2. How to reduce switching time to another mds? _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com