I have on a cephfs client again (luminous cluster, centos7, only 32 osds!). Wanted to share the 'fix' [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon 1) I blocked client access to the monitors with iptables -I INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT Resulting in [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:18 2019] libceph: mon1 192.168.10.112:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:26 2019] libceph: mon2 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:28 2019] libceph: mon2 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:30 2019] libceph: mon2 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:42 2019] libceph: mon2 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:45 2019] libceph: mon0 192.168.10.111:6789 socket closed (con state CONNECTING) [Thu Jul 11 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket closed (con state CONNECTING) 2) I applied the suggested changes to the osd map message max, mentioned in early threads[0] ceph tell osd.* injectargs '--osd_map_message_max=10' ceph tell mon.* injectargs '--osd_map_message_max=10' [@c01 ~]# ceph daemon osd.0 config show|grep message_max "osd_map_message_max": "10", [@c01 ~]# ceph daemon mon.a config show|grep message_max "osd_map_message_max": "10", [0] https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg54419.html http://tracker.ceph.com/issues/38040 3) Allow access to a monitor with iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT Getting [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 12:39:26 2019] libceph: osd0 down [Thu Jul 11 12:39:26 2019] libceph: osd0 up Problems solved, in D state hung unmount was released. I am not sure if the prolonged disconnection to the monitors was the solution or the osd_map_message_max=10, or both. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com