Hi, I'm trying to make our system a bit more fault tolerant, and I struggle a bit with letting clients reconnect if they have lost contact for a while. When there is a temporary network problem, I would like clients to block I/O, wait for a connection, and resume. Do I have any options other than just increasing mds_session_autoclose ? Is there a downside for using very large value here (like, a full day?)? I expect all clients to be connected at all times anyway when things are running normally. What I see right now (if the disconnect is sufficiently long) is that the ceph client releases the I/O block, and you get permission denied on all I/O operations on the existing mount point. Re-mounting it works, but, this also requires killing off all active session blocking unmounting. Basically, just overall bad is this happens, and I would prefer almost any other option. I can see that the client tries a reconnect when this happens: Nov 12 11:53:24 hebbe01-3 kernel: libceph: mds0 10.43.20.3:6800 connection reset Nov 12 11:53:24 hebbe01-3 kernel: libceph: reset on mds0 Nov 12 11:53:24 hebbe01-3 kernel: ceph: mds0 closed our session Nov 12 11:53:24 hebbe01-3 kernel: ceph: mds0 reconnect start Nov 12 11:53:24 hebbe01-3 kernel: ceph: mds0 reconnect denied Nov 12 11:56:55 hebbe01-3 kernel: libceph: mds0 10.43.20.3:6800 socket closed (con state NEGOTIATING) Nov 12 11:56:55 hebbe01-3 kernel: ceph: mds0 rejected session but the logs on the MDS server disallows it as it's not in a "reconnect state"- So, if I understand this correctly, reconnecting is just available in the case that the MDS server was rebooted? Best regards, Mikael _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx